• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 475
  • 88
  • 87
  • 56
  • 43
  • 21
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 988
  • 321
  • 203
  • 183
  • 168
  • 165
  • 154
  • 138
  • 124
  • 104
  • 96
  • 95
  • 93
  • 88
  • 83
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
471

Παραλληλισμός αλγορίθμων σε κάρτες γραφικών για σχεδιασμό κίνησης

Πάσχος, Ανδρέας 16 May 2014 (has links)
Στην παρούσα διπλωματική, κύριος στόχος ήταν η παραλληλοποίηση ενός αλγορίθμου σχεδιασμού κίνησης για κάρτες γραφικών. Για το σκοπό αυτό, χρησιμοποιήθηκε ο Probabilistic Road Map (PRM), ένας αλγόριθμος που προσφέρει μεγάλο βαθμό παραλληλισμού και, συνεπώς, προτείνεται για υλοποίηση σε πολυπύρηνους επεξεργαστές. Το πλαίσιο εργασίας που χρησιμοποιήθηκε για τον προγραμματισμό στην κάρτα γραφικών ήταν το OpenCL επειδή προσφέρει ένα αφαιρετικό επίπεδο προγραμματισμού ανεξαρτήτως υλικού και μπορεί να μεταφερθεί σε κάρτες γραφικών από διαφορετικούς κατασκευαστές. Ο αλγόριθμος αποσυντέθηκε στα δομικά του μέρη και καθένα από αυτά μελετήθηκε ξεχωριστά, ώστε να παραλληλοποιηθεί. Κατά τη διαδικασία αυτή, λοιπόν, υλοποιήθηκαν οι εξής αλγόριθμοι: • Ταξινόμηση • Αναζήτηση Γράφου κατά Πλάτος • Κατακερματισμός • Αναζήτηση Κοντινότερων Γειτόνων Οι παραπάνω αλγόριθμοι έχουν γραφτεί με τέτοιο τρόπο ώστε να μπορούν να χρησιμοποιηθούν αυτόνομα, ως ξεχωριστά κομμάτια. / In this thesis work, the main objective was the parallelization of a motion planning algorithm for graphics card units. For this purpose, the Probabilistic Road Map (PRM) was chosen, an algorithm that offers a high degree of parallelism and, consequently, is suggested for implementation in many core processing units. The framework used for GPU programming was OpenCL because it provides an abstraction programming layer independent of hardware and is portable among GPUs. The algorithm was decomposed in its structural components and each one of them was processed indepedently with the purpose of massive parallelization. During this process, the following algorithms were implemented: • Sorting • Breadth First Traversal • Hashing • Nearest Neighbours Search The above algorithms have been written in such a way so that they can be used as separate parts.
472

TCRβ Repertoire Modeling Using A GPU-Based In-Silico DNA Recombination Algorithm

Striemer, Gregory M. January 2013 (has links)
High-throughput technologies in biological sciences have led to an exponential growth in the amount of data generated over the past several years. This data explosion is forcing scientists to search for innovative computational designs to reduce the time-scale of biological system simulations, and enable rapid study of larger and more complex biological systems. In the field of immunobiology, one such simulation is known as DNA recombination. It is a critical process for investigating the correlation between disease and immune system responses, and discovering the immunological changes that occur during aging through T-cell repertoire analysis. In this project we design and develop a massively parallel method tailored for Graphics Processing Unit (GPU) processors by identifying novel ways of restructuring the flow of the repertoire analysis. The DNA recombination process is the central mechanism for generating diversity among antigen receptors such as T-cell receptors (TCRs). This diversity is crucial for the development of the adaptive immune system. However, modeling of all the α β TCR sequences is encumbered by the enormity of the potential repertoire, which has been predicted to exceed 10¹⁵ sequences. Prior modeling efforts have, therefore, been limited to extrapolations based on the analysis of minor subsets of the overall TCR β repertoire. In this study, we map the recombination process completely onto the GPU hardware architecture using the CUDA programming environment to circumvent prior limitations. For the first time, a model of the mouse TCRβ is presented to an extent which enabled the evaluation of the Convergent Recombination Hypothesis (CRH) comprehensively at a peta-scale level on a single GPU. Understanding the recombination process will allow scientists to better determine the likelihood of transplant rejections, immune system responses to foreign antigens and cancers, and plan treatments based on the genetic makeup of a given patient.
473

Autonomic Programming Paradigm for High Performance Computing

Jararweh, Yaser January 2010 (has links)
The advances in computing and communication technologies and software tools have resulted in an explosive growth in networked applications and information services that cover all aspects of our life. These services and applications are inherently complex, dynamic and heterogeneous. In a similar way, the underlying information infrastructure, e.g. the Internet, is large, complex, heterogeneous and dynamic, globally aggregating large numbers of independent computing and communication resources. The combination of the two results in application development and management complexities that break current computing paradigms, which are based on static behaviors. As a result, applications, programming environments and information infrastructures are rapidly becoming fragile, unmanageable and insecure. This has led researchers to consider alternative programming paradigms and management techniques that are based on strategies used by biological systems. Autonomic programming paradigm is inspired by the human autonomic nervous system that handles complexity, uncertainties and abnormality. The overarching goal of the autonomic programming paradigm is to help building systems and applications capable of self-management. Firstly, we investigated the large-scale scientific computing applications which generally experience different execution phases at run time and each phase has different computational, communication and storage requirements as well as different physical characteristics. In this dissertation, we present Physics Aware Optimization (PAO) paradigm that enables programmers to identify the appropriate solution methods to exploit the heterogeneity and the dynamism of the application execution states. We implement a Physics Aware Optimization Manager to exploit the PAO paradigm. On the other hand we present a self configuration paradigm based on the principles of autonomic computing that can handle efficiently complexity, dynamism and uncertainty in configuring server and networked systems and their applications. Our approach is based on making any resource/application to operate as an Autonomic Component (that means it can be self-managed component) by using our autonomic programming paradigm. Our POA technique for medical application yielded about 3X improvement of performance with 98.3% simulation accuracy compared to traditional techniques for performance optimization. Also, our Self-configuration management for power and performance management in GPU cluster demonstrated 53.7% power savings for CUDAworkload while maintaining the cluster performance within given acceptable thresholds.
474

Multi-Resolution Volume Rendering of Large Medical Data Sets on the GPU

Towfeek, Ajden January 2008 (has links)
Volume rendering techniques can be powerful tools when visualizing medical data sets. The characteristics of being able to capture 3-D internal structures make the technique attractive. Scanning equipment is producing medical images, with rapidly increasing resolution, resulting in heavily increased size of the data set. Despite the great amount of processing power CPUs deliver, the required precision in image quality can be hard to obtain in real-time rendering. Therefore, it is highly desirable to optimize the rendering process. Modern GPUs possess much more computational power and is available for general purpose programming through high level shading languages. Efficient representations of the data are crucial due to the limited memory provided by the GPU. This thesis describes the theoretical background and the implementation of an approach presented by Patric Ljung, Claes Lundström and Anders Ynnerman at Linköping University. The main objective is to implement a fully working multi-resolution framework with two separate pipelines for pre-processing and real-time rendering, which uses the GPU to visualize large medical data sets.
475

Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU

Amini, Mehdi 13 December 2012 (has links) (PDF)
Depuis le début des années 2000, la performance brute des cœurs des processeurs a cessé son augmentation exponentielle. Les circuits graphiques (GPUs) modernes ont été conçus comme des circuits composés d'une véritable grille de plusieurs centaines voir milliers d'unités de calcul. Leur capacité de calcul les a amenés à être rapidement détournés de leur fonction première d'affichage pour être exploités comme accélérateurs de calculs généralistes. Toutefois programmer un GPU efficacement en dehors du rendu de scènes 3D reste un défi.La jungle qui règne dans l'écosystème du matériel se reflète dans le monde du logiciel, avec de plus en plus de modèles de programmation, langages, ou API, sans laisser émerger de solution universelle.Cette thèse propose une solution de compilation pour répondre partiellement aux trois "P" propriétés : Performance, Portabilité, et Programmabilité. Le but est de transformer automatiquement un programme séquentiel en un programme équivalent accéléré à l'aide d'un GPU. Un prototype, Par4All, est implémenté et validé par de nombreuses expériences. La programmabilité et la portabilité sont assurées par définition, et si la performance n'est pas toujours au niveau de ce qu'obtiendrait un développeur expert, elle reste excellente sur une large gamme de noyaux et d'applications.Une étude des architectures des GPUs et les tendances dans la conception des langages et cadres de programmation est présentée. Le placement des données entre l'hôte et l'accélérateur est réalisé sans impliquer le développeur. Un algorithme d'optimisation des communications est proposé pour envoyer les données sur le GPU dès que possible et les y conserver aussi longtemps qu'elle ne sont pas requises sur l'hôte. Des techniques de transformations de boucles pour la génération de code noyau sont utilisées, et même certaines connues et éprouvées doivent être adaptées aux contraintes posées par les GPUs. Elles sont assemblées de manière cohérente, et ordonnancées dans le flot d'un compilateur interprocédural. Des travaux préliminaires sont présentés au sujet de l'extension de l'approche pour cibler de multiples GPUs.
476

Parallel Electromagnetic Transient Simulation of Large-Scale Power Systems on Massive-threading Hardware

Zhou, Zhiyin Unknown Date
No description available.
477

Parallel algorithm design and implementation of regular/irregular problems: an in-depth performance study on graphics processing units

Solomon, Steven 16 January 2012 (has links)
Recently, interest in the Graphics Processing Unit (GPU) for general purpose parallel applications development and research has grown. Much of the current research on the GPU focuses on the acceleration of regular problems, as irregular problems typically do not provide the same level of performance on the hardware. We explore the potential of the GPU by investigating four problems on the GPU with regular and/or irregular properties: lookback option pricing (regular), single-source shortest path (irregular), maximum flow (irregular), and the task matching problem using multi-swarm particle swarm optimization (regular with elements of irregularity). We investigate the design, implementation, optimization, and performance of these algorithms on the GPU, and compare the results. Our results show that the regular problem achieves greater performance and requires less development effort than the irregular problems. However, we find the GPU to still be capable of providing high levels of acceleration for irregular problems.
478

Harnessing Data Parallel Hardware for Server Workloads

Agrawal, Sandeep R. January 2015 (has links)
<p>Trends in increasing web traffic demand an increase in server throughput while preserving energy efficiency and total cost of ownership. Present work in optimizing data center efficiency primarily focuses on using general purpose processors, however these might not be the most efficient platforms for server workloads. Data parallel hardware achieves high energy efficiency by amortizing instruction costs across multiple data streams, and high throughput by enabling massive parallelism across independent threads. These benefits are considered traditionally applicable to scientific workloads, and common server tasks like page serving or search are considered unsuitable for a data parallel execution model.</p><p>Our work builds on the observation that server workload execution patterns are not completely unique across multiple requests. For a high enough arrival rate, a server has the opportunity to launch cohorts of similar requests on data parallel hardware, improving server performance and power/energy efficiency. We present a framework---called Rhythm---for high throughput servers that can exploit similarity across requests to improve server performance and power/energy efficiency by launching data parallel executions for request cohorts. An implementation of the SPECWeb Banking workload using Rhythm on NVIDIA GPUs provides a basis for evaluation. </p><p>Similarity search is another ubiquitous server workload that involves identifying the nearest neighbors to a given query across a large number of points. We explore the performance, power and dollar benefits of using accelerators to perform similarity search for query cohorts in very high dimensions under tight deadlines, and demonstrate an implementation on GPUs that searches across a corpus of billions of documents and is significantly cheaper than commercial deployments. We show that with software and system modifications, data parallel designs can greatly outperform common task parallel implementations.</p> / Dissertation
479

Gravitational Microlensing: An automated high-performance modelling system

McDougall, Alistair January 2014 (has links)
Nightly surveys of the skies detect thousands of new gravitational microlensing events every year. With the increasing number of telescopes, and advancements of the tech- nologies used, the detection rate is growing. Of these events, those that display the characteristics of a binary lens are of particular interest. They require special atten- tion with follow-up observations if possible, as such events can lead to new planetary detections. To characterise a new planetary event, high-cadence, accurate observations are optimal. However, without the ability of repeat observations, identification that any event may be planetary needs to happen before it finishes. I have developed a system that automatically retrieves all microlensing survey data and follow-up observations, models the events as single lenses, and publishes the results live to a web site. With minimal human interaction, the modelling system is able to identify and initialize binary events, and perform a thorough search of the seven dimensional parameter space of a binary lens. These results are also presented live through the web site, enabling observers an up to date view of the latest binary solutions. The real-time modelling of the system enables a prompt analysis of ongoing events, providing observers with the information, to determine if further observations are desired for the modelled events. An archive of all modelled binary lens events is maintained and accessible through the website. To date the archive contains 68 unique events’ binary lens solutions from the 2014 observing season. The system developed has been validated through model comparisons of previously published work, and is in use during the current observing season. This year it has played a role in identifying new planetary candidate events, confirming proposed solutions, and providing alternate viable solutions to previously presented solutions.
480

MPI WITHIN A GPU

Young, Bobby Dalton 01 January 2009 (has links)
GPUs offer high-performance floating-point computation at commodity prices, but their usage is hindered by programming models which expose the user to irregularities in the current shared-memory environments and require learning new interfaces and semantics. This thesis will demonstrate that the message-passing paradigm can be conceptually cleaner than the current data-parallel models for programming GPUs because it can hide the quirks of current GPU shared-memory environments, as well as GPU-specific features, behind a well-established and well-understood interface. This will be shown by demonstrating a proof-of-concept MPI implementation which provides cleaner, simpler code with a reasonable performance cost. This thesis will also demonstrate that, although there is a virtualization constraint imposed by MPI, this constraint is harmless as long as the virtualization was already chosen to be optimal in terms of a strong execution model and nearly-optimal execution time. This will be demonstrated by examining execution times with varying virtualization using a computationally-expensive micro-kernel.

Page generated in 0.0892 seconds