• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • 1
  • Tagged with
  • 5
  • 5
  • 5
  • 5
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Task Performance with List-Mode Data

Caucci, Luca January 2012 (has links)
This dissertation investigates the application of list-mode data to detection, estimation, and image reconstruction problems, with an emphasis on emission tomography in medical imaging. We begin by introducing a theoretical framework for list-mode data and we use it to define two observers that operate on list-mode data. These observers are applied to the problem of detecting a signal~(known in shape and location) buried in a random lumpy background. We then consider maximum-likelihood methods for the estimation of numerical parameters from list-mode data, and we characterize the performance of these estimators via the so-called Fisher information matrix. Reconstruction from PET list-mode data is then considered. In a process we called "double maximum-likelihood" reconstruction, we consider a simple PET imaging system and we use maximum-likelihood methods to first estimate a parameter vector for each pair of gamma-ray photons that is detected by the hardware. The collection of these parameter vectors forms a list, which is then fed to another maximum-likelihood algorithm for volumetric reconstruction over a grid of voxels. Efficient parallel implementation of the algorithms discussed above is then presented. In this work, we take advantage of two low-cost, mass-produced computing platforms that have recently appeared on the market, and we provide some details on implementing our algorithms on these devices. We conclude this dissertation work by elaborating on a possible application of list-mode data to X-ray digital mammography. We argue that today's CMOS detectors and computing platforms have become fast enough to make X-ray digital mammography list-mode data acquisition and processing feasible.
2

Δημιουργία, μελέτη και βελτιστοποίηση φωτορεαλιστικών απεικονίσεων πραγματικού χρόνου με χρήση προγραμματιζόμενων επεξεργαστών γραφικών

Σταυρόπουλος, Ασημάκης 22 September 2009 (has links)
Οι προγραμματιζόμενοι επεξεργαστές γραφικών (Graphics Processing Units - GPUs), είναι πανίσχυροι παράλληλοι επεξεργαστές και πλέον υπάρχουν σε κάθε σύγχρονο προσωπικό υπολογιστή (PC). Οι GPUs αναλαμβάνουν κι επιταχύνουν την σχεδίαση δισδιάστατων και τρισδιάστατων γραφικών στην οθόνη του υπολογιστή. Η εξέλιξή τους είναι τόσο ραγδαία τα τελευταία χρόνια, που πλέον ξεπερνούν σε πολυπλοκότητα τις σύγχρονες κεντρικές μονάδες επεξεργασίας (CPUs), ενώ είναι ικανές να επιταχύνουν εκτός από γραφικά κι άλλες απαιτητικές σε επεξεργαστική ισχύ εφαρμογές, όπως είναι η τεχνητή νοημοσύνη και η προσομοίωση φυσικών αλληλεπιδράσεων μεταξύ αντικειμένων (συγκρούσεις, εκρήξεις, προσομοίωση κίνησης υγρών) κ.α. Σκοπός της συγκεκριμένης εργασίας είναι η δημιουργία, η μελέτη και η βελτιστοποίηση αλγορίθμων σκίασης με χρήση GPUs. Ο όρος σκίαση (shading) αναφέρεται στην αλληλεπίδραση του φωτός με τα αντικείμενα ενός εικονικού περιβάλλοντος. Παρουσιάζονται τα εργαλεία (APIs) και οι γλώσσες προγραμματισμού των GPUs καθώς και τρόποι βελτιστοποίησης της εκτέλεσης των σκιάσεων που είναι ένα θέμα μείζονος σημασίας σε προσομοιώσεις πραγματικού χρόνου. / Graphics processing units (GPUs), are powerful parallel processors and today are found in every modern Personal Computer (PC). The GPUs accelerate the drawing of two and three dimensional graphics on the monitor of the PCs. The evolution of this hardware is very rapid the last decade and today these circuits are more complex than CPUs. They are capable of accelerating many demanding applications except graphics, like Artificial Intelligence and Physics Simulation. The purpose of this thesis is to implement, study and optimize the execution of shading algorithms that run on GPUs in real time. The term shading refers to the interactions between light and the material of every object in a virtual three dimensional environment. In this thesis we present the tools, the programming languages and techniques for optimizing the execution of the shaders which is a matter of major importance in real time simulations.
3

Efficient Compilation Of Stream Programs Onto Multi-cores With Accelerators

Udupa, Abhishek 07 1900 (has links)
Over the past two decades, microprocessor manufacturers have typically relied on wider issue widths and deeper pipelines to obtain performance improvements for single threaded applications. However, in the recent years, with power dissipation and wire delays becoming primary design constraints, this approach can no longer be effectively used to yield performance improvements. Thus process designers and vendors are universally moving towards multi-core designs. Examples for these are the commodity general purpose multi-core processors, the CellBE accelerator from IBM and the Graphics Processing Units from NVIDIA and ATI. Although these many and multi-core architectures can provide enormous performance benefits, it is difficult to program for them due to the complexity of writing explicitly parallel code. The ubiquity of computationally intensive media processing applications makes it imperative to consider new programming frameworks and languages that can express parallelism in an easy, portable manner. The StreamIt programming language has been proposed to efficiently exploit parallelism at various levels on general purpose multi-core architectures and stream processors and allow media processing and DSP application to be developed in an easy and portable fashion. The StreamIt model allows programmers to specify a program as a set of filters connected by FIFO communication channels. The graphs thus specified by the StreamIt programs describe task, data and pipeline parallelism which can be potentially exploited on modern Graphics Processing Units (GPUs), which have emerged as powerful, commodity stream processors, which support abundant parallelism in hardware. The first part of this thesis deals with the challenges in mapping StreamIt programs to GPUs and proposes an efficient technique to software pipeline the execution of stream Programs on GPUs. We formulate this problem—both scheduling and assignment of filters to processors—as an efficient Integer Linear Program(ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipeline parallelism. We have evaluated our approach on a platform equipped with an NVIDIA GeForce 8800 GTS 512 GPU and our approach yields a (geometric) mean speedup of 5.02X, with a maximum speedup of 36.83X across a set of StreamIt benchmarks, with the speedup measured relative to an optimized single threaded CPU execution. While the approach of software pipelining the execution of stream programs on GPUs is efficient and performs well, it does not utilize the CPU cores to perform useful computation. Further, it does not support programs with stateful filters, which are essentially filters that are not data parallel owing to a dependence between each successive firing that is carried through the implicit state of the filter. The second part of the thesis aims at addressing these issues and describes a novel method to orchestrate the execution of a StreamIt program on the multiple cores of a system and GPUs in a synergistic manner. The proposed approach identifies, using profiling, the relative benefits of executing a task on the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers, the limited DMA bandwidth available and the required buffer layout transformations associated with the partitioning, as an integrated Integer Linear Program(ILP) which can then be solved by an ILP solver. Since solving an ILP is NP-Hard in the general case and may thus require a large amount of time, we also propose an efficient heuristic algorithm for the work partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solutions to the ILP formulation on an average across the benchmark suite, while requiring 2–3 orders of magnitude less time than the ILP approach. The partitioned tasks are then software pipelined to execute on the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with eight CPU cores, out of which four were used, and a GeForce 8800 GTS512 GPU show a(geometric) mean speed up of 6.84X with a maximum of 51.96X over a single threaded CPU execution across a set of StreamIt benchmarks.
4

Contributions to parallel stochastic simulation : application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte Carlo simulations / Contributions à la simulation stochastique parallèle : architectures logicielles pour la distribution de flux pseudo-aléatoires dans les simulations Monte Carlo sur CPU/GPU

Passerat-Palmbach, Jonathan 11 October 2013 (has links)
Résumé non disponible / The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all the tools must be reimplemented as well. In the particular case of stochastic simulations, one of the major element of the implementation is the pseudorandom numbers source. Employing pseudorandom numbers in parallel applications is not a straightforward task, and it has to be done with caution in order not to introduce biases in the results of the simulation. This problematic has been studied since parallel architectures are available and is called pseudorandom stream distribution. While the literature is full of solutions to handle pseudorandom stream distribution on CPU-based parallel platforms, the young GPU programming community cannot display the same experience yet.In this thesis, we study how to correctly distribute pseudorandom streams on GPU. From the existing solutions, we identified a need for good software engineering solutions, coupled to sound theoretical choices in the implementation. We propose a set of guidelines to follow when a PRNG has to be ported to GPU, and put these advice into practice in a software library called ShoveRand. This library is used in a stochastic Polymer Folding model that we have implemented in C++/CUDA. Pseudorandom streams distribution on manycore architectures is also one of our concerns. It resulted in a contribution named TaskLocalRandom, which targets parallel Java applications using pseudorandom numbers and task frameworks.Eventually, we share a reflection on the methods to choose the right parallel platform for a given application. In this way, we propose to automatically build prototypes of the parallel application running on a wide set of architectures. This approach relies on existing software engineering tools from the Java and Scala community, most of them generating OpenCL source code from a high-level abstraction layer.
5

Contributions to parallel stochastic simulation: Application of good software engineering practices to the distribution of pseudorandom streams in hybrid Monte-Carlo simulations

Passerat-Palmbach, Jonathan 11 October 2013 (has links) (PDF)
The race to computing power increases every day in the simulation community. A few years ago, scientists have started to harness the computing power of Graphics Processing Units (GPUs) to parallelize their simulations. As with any parallel architecture, not only the simulation model implementation has to be ported to the new parallel platform, but all the tools must be reimplemented as well. In the particular case of stochastic simulations, one of the major element of the implementation is the pseudorandom numbers source. Employing pseudorandom numbers in parallel applications is not a straightforward task, and it has to be done with caution in order not to introduce biases in the results of the simulation. This problematic has been studied since parallel architectures are available and is called pseudorandom stream distribution. While the literature is full of solutions to handle pseudorandom stream distribution on CPU-based parallel platforms, the young GPU programming community cannot display the same experience yet. In this thesis, we study how to correctly distribute pseudorandom streams on GPU. From the existing solutions, we identified a need for good software engineering solutions, coupled to sound theoretical choices in the implementation. We propose a set of guidelines to follow when a PRNG has to be ported to GPU, and put these advice into practice in a software library called ShoveRand. This library is used in a stochastic Polymer Folding model that we have implemented in C++/CUDA. Pseudorandom streams distribution on manycore architectures is also one of our concerns. It resulted in a contribution named TaskLocalRandom, which targets parallel Java applications using pseudorandom numbers and task frameworks. Eventually, we share a reflection on the methods to choose the right parallel platform for a given application. In this way, we propose to automatically build prototypes of the parallel application running on a wide set of architectures. This approach relies on existing software engineering tools from the Java and Scala community, most of them generating OpenCL source code from a high-level abstraction layer.

Page generated in 0.124 seconds