Global ETD Search

131	Signal- och bildbehandling på moderna grafikprocessorer Pettersson, Erik January 2005 (has links) <p>En modern grafikprocessor är oerhört kraftfull och har en prestanda som potentiellt sett är många gånger högre än för en modern mikroprocessor. I takt med att grafikprocessorn blivit alltmer programmerbar har det blivit möjligt att använda den för beräkningstunga tillämpningar utanför dess normala användningsområde. Inom det här arbetet utreds vilka möjligheter och begränsningar som uppstår vid användandet av grafikprocessorer för generell programmering. Arbetet inriktas främst mot signal- och bildbehandlingstillämpningar men mycket av principerna är tillämpliga även inom andra områden.</p><p>Ett ramverk för bildbehandling implementeras och några algoritmer inom bildanalys realiseras och utvärderas, bland annat stereoseende och beräkning av optiskt flöde. Resultaten visar på att vissa tillämpningar kan uppvisa en avsevärd prestandaökning i en grafikprocessor jämfört med i en mikroprocessor men att andra tillämpningar kan vara ineffektiva eller mycket svåra att implementera.</p> / <p>The modern graphical processing unit, GPU, is an extremely powerful unit, potentially many times more powerful than a modern microprocessor. Due to its increasing programmability it has recently become possible to use it in computation intensive applications outside its normal usage. This work investigates the possibilities and limitations of general purpose programming on GPUs. The work mainly concentrates on signal and image processing although much of the principles are applicable to other areas as well.</p><p>A framework for image processing on GPUs is implemented and a few computer vision algorithms are implemented and evaluated, among them stereo vision and optical flow. The results show that some applications can gain a substantial speedup when implemented correctly in the GPU but others can be inefficent or extremly hard to implement.</p> GPU GPGPU image processing computer vision stereo vision optical flow Image analysis Bildanalys
132	Modeling performance and power for energy-efficient GPGPU computing Hong, Sunpyo 12 November 2012 (has links) The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems. Model Power Energy GPGPU GPU Analytical model Performance Graphics processing units Computer architecture Energy consumption
133	Directive-based General-purpose GPU Programming Han, Tian Yi David 19 January 2010 (has links) Graphics Processing Units (GPUs) have become a competitive accelerator for non-graphics applications, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. We have designed hiCUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner, and directly to the sequential code. We have also prototyped a compiler that translates a hiCUDA program to a CUDA program and can handle real-world applications. Experiments using seven standard CUDA benchmarks show that the simplicity hiCUDA provides comes at no expense to performance. GPGPU CUDA data-parallelism programming language directive-based language compiler 0984
134	Directive-based General-purpose GPU Programming Han, Tian Yi David 19 January 2010 (has links) Graphics Processing Units (GPUs) have become a competitive accelerator for non-graphics applications, mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. We have designed hiCUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner, and directly to the sequential code. We have also prototyped a compiler that translates a hiCUDA program to a CUDA program and can handle real-world applications. Experiments using seven standard CUDA benchmarks show that the simplicity hiCUDA provides comes at no expense to performance. GPGPU CUDA data-parallelism programming language directive-based language compiler 0984
135	Calcul hautes performances pour les formulations intégrales en électromagnétisme basses fréquences Rubeck, Christophe 18 December 2012 (has links) (PDF) Les méthodes intégrales sont des méthodes particulièrement bien adaptées à la modélisation des systèmes électromagnétiques car contrairement aux méthodes par éléments finis elles ne nécessitent pas le maillage des matériaux inactifs tel que l'air. Ces modèles sont donc légers en termes de nombre de degrés de liberté. Cependant ceux sont des méthodes à interactions totales qui génèrent des matrices de systèmes d'équations pleines. Ces matrices sont longues à calculer en temps processeur et coûteuses à stocker dans la mémoire vive de l'ordinateur. Nous réduisons dans ces travaux les temps de calcul grâce au parallélisme, c'est-à-dire l'utilisation de plusieurs processeurs, notamment sur cartes graphiques (GPGPU). Nous réduisons également le coût du stockage mémoire via de la compression matricielle par ondelettes (il s'agit d'un algorithme proche de la compression d'images). C'est une compression par pertes, nous avons ainsi développé un critère pour contrôler l'erreur introduite par la compression. Les méthodes développées sont appliquées sur une formulation électrostatique de calcul de capacités, mais elles sont à priori également applicables à d'autres formulations. Calcul hautes performances méthodes intégrales compression matricielle par ondelettes architecture GPGPU
136	Algorithmen der Bildanalyse und -synthese für große Bilder und Hologramme / Algorithms for image analysis and synthesis of large images and holograms Kienel, Enrico 22 February 2013 (has links) (PDF) Die vorliegende Arbeit befasst sich mit Algorithmen aus dem Bereich der Bildsegmentierung sowie der Datensynthese für das so genannte Hologrammdruck-Prinzip. Angelehnt an ein anatomisch motiviertes Forschungsprojekt werden aktive Konturen zur halbautomatischen Segmentierung digitalisierter histologischer Schnitte herangezogen. Die besondere Herausforderung liegt dabei in der Entwicklung von verschiedenen Ansätzen, die der Anpassung des Verfahrens für sehr große Bilder dienen, welche in diesem Kontext eine Größe von einigen hundert Megapixel erreichen können. Unter dem Aspekt der größtmöglichen Effizienz, jedoch mit der Beschränkung auf die Verwendung von Consumer-Hardware, werden Ideen vorgestellt, welche eine auf aktiven Konturen basierende Segmentierung bei derartigen Bildgrößen erstmals ermöglichen sowie zur Beschleunigung und Reduktion des Speicheraufwandes beitragen. Darüber hinaus wurde das Verfahren um ein intuitives Werkzeug erweitert, das eine interaktive lokale Korrektur der finalen Kontur gestattet und damit die Praxistauglichkeit der Methode maßgeblich erhöht. Der zweite Teil der Arbeit beschäftigt sich mit einem Druckprinzip für die Herstellung von Hologrammen, basierend auf virtuellen Abbildungsgegenständen. Der Hologrammdruck, der namentlich an die Arbeitsweise eines Tintenstrahldruckers erinnern soll, benötigt dazu spezielle diskrete Bilddaten, die als Elementarhologramme bezeichnet werden. Diese tragen die visuelle Information verschiedener Blickrichtungen durch einen festen geometrischen Ort auf der Hologrammebene. Ein vollständiges, aus vielen Elementarhologrammen zusammengesetztes Hologramm erzeugt dabei ein erhebliches Datenvolumen, das parameterabhängig schnell im Terabyte-Bereich liegen kann. Zwei unabhängige Algorithmen zur Erzeugung geeignet aufbereiteter Daten unter intensiver Ausnutzung von Standard-Graphikhardware werden präsentiert, hinsichtlich ihrer Berechnungs- sowie Speicherkomplexität verglichen und unter Berücksichtigung von Qualitätsaspekten bewertet. Aktive Konturen Digitale Holographie Hologrammdruck Snakes Gradient Vector Flow GPGPU Holoprint ddc:006 Bildanalyse Bilderzeugung Bildsegmentierung
137	Robust Image Registration for Improved Clinical Efficiency : Using Local Structure Analysis and Model-Based Processing Forsberg, Daniel January 2013 (has links) Medical imaging plays an increasingly important role in modern healthcare. In medical imaging, it is often relevant to relate different images to each other, something which can prove challenging, since there rarely exists a pre-defined mapping between the pixels in different images. Hence, there is a need to find such a mapping/transformation, a procedure known as image registration. Over the years, image registration has been proved useful in a number of clinical situations. Despite this, current use of image registration in clinical practice is rather limited, typically only used for image fusion. The limited use is, to a large extent, caused by excessive computation times, lack of established validation methods/metrics and a general skepticism toward the trustworthiness of the estimated transformations in deformable image registration. This thesis aims to overcome some of the issues limiting the use of image registration, by proposing a set of technical contributions and two clinical applications targeted at improved clinical efficiency. The contributions are made in the context of a generic framework for non-parametric image registration and using an image registration method known as the Morphon. In image registration, regularization of the estimated transformation forms an integral part in controlling the registration process, and in this thesis, two regularizers are proposed and their applicability demonstrated. Although the regularizers are similar in that they rely on local structure analysis, they differ in regard to implementation, where one is implemented as applying a set of filter kernels, and where the other is implemented as solving a global optimization problem. Furthermore, it is proposed to use a set of quadrature filters with parallel scales when estimating the phase-difference, driving the registration. A proposal that brings both accuracy and robustness to the registration process, as shown on a set of challenging image sequences. Computational complexity, in general, is addressed by porting the employed Morphon algorithm to the GPU, by which a performance improvement of 38-44x is achieved, when compared to a single-threaded CPU implementation. The suggested clinical applications are based upon the concept paint on priors, which was formulated in conjunction with the initial presentation of the Morphon, and which denotes the notion of assigning a model a set of properties (local operators), guiding the registration process. In this thesis, this is taken one step further, in which properties of a model are assigned to the patient data after completed registration. Based upon this, an application using the concept of anatomical transfer functions is presented, in which different organs can be visualized with separate transfer functions. This has been implemented for both 2D slice visualization and 3D volume rendering. A second application is proposed, in which landmarks, relevant for determining various measures describing the anatomy, are transferred to the patient data. In particular, this is applied to idiopathic scoliosis and used to obtain various measures relevant for assessing spinal deformity. In addition, a data analysis scheme is proposed, useful for quantifying the linear dependence between the different measures used to describe spinal deformities. Image registration deformable models scoliosis visualization volume rendering adaptive regularization GPGPU CUDA
138	Programming Models and Runtimes for Heterogeneous Systems Grossman, Max 16 September 2013 (has links) With the plateauing of processor frequencies and increase in energy consumption in computing, application developers are seeking new sources of performance acceleration. Heterogeneous platforms with multiple processor architectures offer one possible avenue to address these challenges. However, modern heterogeneous programming models tend to be either so low-level as to severely hinder programmer productivity, or so high-level as to limit optimization opportunities. The novel systems presented in this thesis strike a better balance between abstraction and transparency, enabling programmers to be productive and produce high-performance applications on heterogeneous platforms. This thesis starts by summarizing the strengths, weaknesses, and features of existing heterogeneous programming models. It then introduces and evaluates four novel heterogeneous programming models and runtime systems: JCUDA, CnC-CUDA, DyGR, and HadoopCL. We'll conclude by positioning the key contributions of each piece in this thesis relative to the state-of-the-art, and outline possible directions for future work. heterogeneous GPU GPGPU programming model runtime multicore abstraction distributed CUDA OpenCL Hadoop
139	ParModelica : Extending the Algorithmic Subset ofModelica with Explicit Parallel LanguageConstructs for Multi-core Simulation Gebremedhin, Mahder January 2011 (has links) In today’s world of high tech manufacturing and computer-aided design simulations of models is at theheart of the whole manufacturing process. Trying to represent and study the variables of real worldmodels using simulation computer programs can turn out to be a very expensive and time consumingtask. On the other hand advancements in modern multi-core CPUs and general purpose GPUs promiseremarkable computational power. Properly utilizing this computational power can provide reduced simulation time. To this end modernmodeling environments provide different optimization and parallelization options to take advantage ofthe available computational power. Some of these parallelization approaches are based onautomatically extracting parallelism with the help of a compiler. Another approach is to provide themodel programmers with the necessary language constructs to express any potential parallelism intheir models. This second approach is taken in this thesis work. The OpenModelica modeling and simulation environment for the Modelica language has beenextended with new language constructs for explicitly stating parallelism in algorithms. This slightlyextended algorithmic subset of Modelica is called ParModelica. The new extensions allow modelswritten in ParModelica to be translated to optimized OpenCL code which can take advantage of thecomputational power of available Multi-core CPUs and general purpose GPUs. Parallel programming Multi-core Modeling Modelica OpenModelica OpenCL CUDA GPGPU GPU ParModelica
140	Solution Of Sparse Systems On Gpu Architecture Lulec, Andac 01 June 2011 (has links) (PDF) The solution of the linear system of equations is one of the core aspects of Finite Element Analysis (FEA) software. Since large amount of arithmetic operations are required for the solution of the system obtained by FEA, the influence of the solution of linear equations on the performance of the software is very significant. In recent years, the increasing demand for performance in the game industry caused significant improvements on the performances of Graphical Processing Units (GPU). With their massive floating point operations capability, they became attractive sources of performance for the general purpose programmers. Because of this reason, GPUs are chosen as the target hardware to develop an efficient parallel direct solver for the solution of the linear equations obtained from FEA.

Search results