• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 166
  • 65
  • 52
  • 12
  • 10
  • 9
  • 7
  • 6
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 401
  • 205
  • 118
  • 107
  • 80
  • 74
  • 71
  • 54
  • 42
  • 42
  • 38
  • 36
  • 36
  • 32
  • 32
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
351

Částicové simulace v reálném čase / Real-Time Particle Simulations

Horváth, Zsolt January 2012 (has links)
Particle simulations in real-time become reality only a few years before, when in computer science occured the idea of GPGPU. This new technology allows use the massive force of graphics card for general purposes. Today, the trend is to accelerate existing algorithms by rewriting into parallel form. On this priciple operate the particle systems too. An interesting area of particle systems are fluid simulations. The simulations are based on the theory of Navier-Stokes equations and their numerical solutions with SPH (Smoothed particle hydrodynamics). Liquids are part of everyday life, and therefore it is important to render them realistically. They are used in modern computer games and different visualizations that run in real time, therefore they must be quickly displayed.
352

Algoritmy číslicového zpracování obrazu na grafických kartách / The algorithms of digital image processing on graphics cards

Bielczyk, Marek January 2016 (has links)
Purpose of this work is show possibility of using grapichs cart for imaging a video signal. This work is particularly focused on technology CUDA and OpenCL. The solution is first focused on graphics cart and show how has been changed components and how has been changed performaces of graphics cart. Then show CUDA and OpenCL technology itself, and show samples of codes with explain, what which code do. Output of this work is some programs, witch defined for both technology and for both procesors unit. Contribution of this work is show differents between procesors unit, witch can be used to right choose for design your own algorithm.
353

Verwendung von Grafikprozessoren zur Simulation von Diffusionsprozessen mit zufälligen Sierpiński-Teppichen

Lang, Jens 03 November 2008 (has links)
In dieser Arbeit wurde ein Verfahrung zur Random-Walk-Simulation auf fraktalen Strukturen untersucht. Es dient der Simulation von Diffusion in porösen Materialien. Konkret wurde der Mastergleichungsansatz zur Simulation eines Random Walks auf Sierpiński-Teppichen für GPGPUs (General Purpose Graphics Processing Units) in drei verschiedenen Versionen implementiert: Zunächst wurde die gesamte Fläche in einem zweidimensionalen Array gespeichert. Danach wurde eine Version untersucht, bei der nur die begehbaren Felder abgespeichert wurden. Diese Vorgehensweise spart Speicher, da die Sierpiński-Teppiche meist nur dünn besetzt sind. Weiter wurde die Implementierung verbessert, indem die Fläche jeweils dynamisch erweitert wird, wenn die Simulation an den Rand des vorhandenen Gebietes stößt. Die genutzten Grafikprozessoren arbeiten nach dem SIMD-Prinzip. Daher wurde zusätzlich untersucht, ob sich Laufzeitverbesserungen ergeben, wenn der Code dahingehend optimiert wird. Die Ergebnisse zeigen, dass sich in der Tat eine kürzere Laufzeit ergibt, wenn nur noch begehbare Felder abgespeichert werden. Noch weiter kann die Laufzeit mit der dynamischen Erweiterung der Simulationsfläche verkürzt werden. Optimierungen für die SIMD-Arbeitsweise der Prozessoren bringen jedoch keine Laufzeitver besserung. / This thesis investigates an algorithm for random walk simulations on fractal structures. Its purpose is the simulation of diffusion in porous materials. Indeed the master equation approach for the simulation of random walks on Sierpiński carpets has been implemented for GPGPUs (general purpose graphics processing units) in three different versions: In the first approach the whole carpet has been saved in a two-dimensional array. Secondly a version was investigated that only saves the present cells. This strategy saves memory as Sierpiński carpets are generally sparse. The implementation has been further improved by extending the carpet dynamically each time when the simulation reaches its current border. The graphics processing units that were used have a SIMD architecture. Therefore it has been investigated additionally if optimization for the SIMD architecture leads to performance improvements. The results show that execution time does indeed decrease if only present cells are being saved. It can be decreased further by dynamically extending the carpet. Optimizations for the SIMD architecture did not result in a reduced execution time.
354

Využití GPU pro akcelerované zpracování obrazu / Image Processing on GPUs

Bačík, Ladislav January 2008 (has links)
This master thesis deals with modern technologies in graphic hardware and using their for general purpose computing. It is primary focused on architecture of unified processors and algorithm implementation via CUDA programming interface. Thesis base is to choose suited algorithm for GPU horsepower demonstration. Main aim of this work is implementation of multiplatform library offering algorithms for discrete volumetric data vectorization. For this purpose was chosen algorithm Marching cubes that is able to find surface of processed object. In created library will be contained algorithm runnable on graphic device and also one runnable on CPU. Finally we compare both variants and discuss their pros and cons.
355

Moderní metody řešení eliptických parciálních diferenciálních rovnic / Advanced Eliptic Partial Differential Equations Solution

Valenta, Václav January 2009 (has links)
Partial differential equations solution and methods for transformation to a large sets of ordinary equations is described in this work. Taylor series method is important for this work. This method needs higher derivatives for correct work. Ways how to compute higher derivatives are also discused in this work.
356

Co-designing Communication Middleware and Deep Learning Frameworks for High-Performance DNN Training on HPC Systems

Awan, Ammar Ahmad 10 September 2020 (has links)
No description available.
357

A Conjugate Residual Solver with Kernel Fusion for massive MIMO Detection

Broumas, Ioannis January 2023 (has links)
This thesis presents a comparison of a GPU implementation of the Conjugate Residual method as a sequence of generic library kernels against implementations ofthe method with custom kernels to expose the performance gains of a keyoptimization strategy, kernel fusion, for memory-bound operations which is to makeefficient reuse of the processed data. For massive MIMO the iterative solver is to be employed at the linear detection stageto overcome the computational bottleneck of the matrix inversion required in theequalization process, which is 𝒪(𝑛3) for direct solvers. A detailed analysis of howone more of the Krylov subspace methods that is feasible for massive MIMO can beimplemented on a GPU as a unified kernel is given. Further, to show that kernel fusion can improve the execution performance not onlywhen the input data is large matrices-vectors as in scientific computing but also inthe case of massive MIMO and possibly similar cases where the input data is a largenumber of small matrices-vectors that must be processed in parallel.In more details, focusing on the small number of iterations required for the solver toachieve a close enough approximation of the exact solution in the case of massiveMIMO, and the case where the number of users matches the size of a warp, twodifferent approaches that allow to fully unroll the algorithm and gradually fuse allthe separate kernels into a single, until reaching a top-down hardcodedimplementation are proposed and tested. Targeting to overcome the algorithms computational burden which is the matrixvector product, further optimization techniques such as two ways to utilize the faston-chip memories, preloading the matrix in shared memory and preloading thevector in shared memory, are tested and proposed to achieve high efficiency andhigh parallelism.
358

Analysis, Implementation and Evaluation of Direction Finding Algorithms using GPU Computing / Analys, implementering och utvärdering av riktningsbestämningsalgoritmer på GPU

Andersdotter, Regina January 2022 (has links)
Direction Finding (DF) algorithms are used by the Swedish Defence Research Agency (FOI) in the context of electronic warfare against radio. Parallelizing these algorithms using a Graphics Processing Unit (GPU) might improve performance, and thereby increase military support capabilities. This thesis selects the DF algorithms Correlative Interferometer (CORR), Multiple Signal Classification (MUSIC) and Weighted Subspace Fitting (WSF), and examines to what extent GPU implementation of these algorithms is suitable, by analysing, implementing and evaluating. Firstly, six general criteria for GPU suitability are formulated. Then the three algorithms are analyzed with regard to these criteria, giving that MUSIC and WSF are both 58% suitable, closely followed by CORR on 50% suitability. MUSIC is selected for implementation, and an open source implementation is extended to three versions: a multicore CPU version, a GPU version (with Eigenvalue Decomposition (EVD) and pseudo spectrum calculation performed on the GPU), and a MIXED version (with only pseudo spectrum calculation on the GPU). These versions are then evaluated for angle resolutions between 1° and 0.025°, and CUDA block sizes between 8 and 1024. It is found that the GPU version is faster than the CPU version for angle resolutions above 0.1°, and the largest measured speedup is 1.4 times. The block size has no large impact on the total runtime. In conclusion, the overall results indicate that it is not entirely suitable, yet somewhat beneficial for large angle resolutions, to implement MUSIC using GPU computing.
359

Efficient And Scalable Evaluation Of Continuous, Spatio-temporal Queries In Mobile Computing Environments

Cazalas, Jonathan M 01 January 2012 (has links)
A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatiotemporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. iv For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatiotemporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications
360

Interactive Mesostructures

Nykl, Scott L. January 2013 (has links)
No description available.

Page generated in 0.0639 seconds