Global ETD Search

41	Grafikkort till parallella beräkningar Music, Sani January 2012 (has links) Den här studien beskriver hur grafikkort kan användas på en bredare front änmultimedia. Arbetet förklarar och diskuterar huvudsakliga alternativ som finnstill att använda grafikkort till generella operationer i dagsläget. Inom denna studieanvänds Nvidias CUDA arkitektur. Studien beskriver hur grafikkort användstill egna operationer rent praktiskt ur perspektivet att vi redan kan programmerai högnivåspråk och har grundläggande kunskap om hur en dator fungerar. Vianvänder s.k. accelererade bibliotek på grafikkortet (THRUST och CUBLAS) föratt uppnå målet som är utveckling av programvara och prestandatest. Resultatetär program som använder GPU:n till generella och prestandatest av dessa,för lösning av olika problem (matrismultiplikation, sortering, binärsökning ochvektor-inventering) där grafikkortet jämförs med processorn seriellt och parallellt.Resultat visar att grafikkortet exekverar upp till ungefär 50 gånger snabbare(tidsmässigt) kod jämfört med seriella program på processorn. / This study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor. Nvidia CUDA THRUST CUBLAS Eigen OpenMP accelererade bibliotek prestandatest GPU CPU vektor inventering sortering binärsökning matrismultiplikation Engineering and Technology Teknik och teknologier
42	Hybridní raytracing v rozhraní DXR / Hybrid Raytracing in DXR Polášek, Tomáš January 2019 (has links) The goal of this thesis is to evaluate the usability of hardware accelerated ray tracing in near-future rendering engines. Specifically, DirectX Ray Tracing API and Nvidia Turing architecture are being examined. Design and implementation of a hybrid rendering engine with support for hardware accelerated ray tracing is included and used in implementation of frequently used graphical effects -- hard and soft shadows, reflections, and Ambient Occlusion. The assessment is made in terms of difficulty of integration into a rendering engine, performance of the resulting system and suitability of implementation of chosen graphical effects. Performance parameters -- including number of rays cast per second, time to build acceleration structures and computation time on the GPU -- are tested and discussed.
43	Neuronové sítě pro klasifikaci typu a kvality průmyslových výrobků / Neural networks for visual classification and inspection of the industrial products Míček, Vojtěch January 2020 (has links) The aim of this master's thesis thesis is to enable evaluation of quality, or the type of product in industrial applications using artificial neural networks, especially in applications where the classical approach of machine vision is too complicated. The system thus designed is implemented onto a specific hardware platform and becomes a subject to the final optimalisation for the hardware platform for the best performance of the system.
44	Detekce objektů na GPU / Object Detection on GPU Macenauer, Pavel January 2015 (has links) This thesis addresses the topic of object detection on graphics processing units. As a part of it, a system for object detection using NVIDIA CUDA was designed and implemented, allowing for realtime video object detection and bulk processing. Its contribution is mainly to study the options of NVIDIA CUDA technology and current graphics processing units for object detection acceleration. Also parallel algorithms for object detection are discussed and suggested.
45	Fyzikální simulace v počítačových hrách / Physical Simulation in Computer Games Dočkal, Jiří January 2010 (has links) The thesis is concerned with modern game engines, focusing on physical simulation and particle systems. It offers usable architectures overview for a game engine development. The thesis provides characteristic to the most essential game engine's logical modules as scene graph, resource management or rendering. Today's tools used for physical simulation in games are also described. Main part of the thesis concentrates on design and implementation of its own C3D game engine which exploits capabilities of the NVIDIA PhysX physical engine. The thesis includes modern techniques rising from author's gained experience.
46	Efficient And Scalable Evaluation Of Continuous, Spatio-temporal Queries In Mobile Computing Environments Cazalas, Jonathan M 01 January 2012 (has links) A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatiotemporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. iv For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatiotemporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based applications Mobile computing continuous queries data streams spatio temporal queries spatio temporal data streams evaluation scalable range query knn gpu geds nvidia cuda Computer Sciences Engineering
47	Parallel Solution of the Subset-sum Problem: An Empirical Study Bokhari, Saniyah S. 21 July 2011 (has links) No description available. Computer Engineering Computer Science Cray XMT Dynamic Programming IBM x3755 Multicore Multithreading NVIDIA FX 5800 OMP Opteron Parallel Algorithms Parallel Computing Shared Memory Subset-sum problem
48	Tuned and asynchronous stencil kernels for CPU/GPU systems Venkatasubramanian, Sundaresan 18 May 2009 (has links) We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly "fast-and-loose" algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs. Hybrid High performance computing Architecture Chaotic relaxation Tesla Linear system of equations Numerical methods Occupancy Algorithms Experimentation Performance Scientific computing Gauss siedel Shared memory Coalesced memory Bank conflicts GPU CUDA Nvidia Heterogenous CPU Iterative methods (Mathematics) Kernel functions
49	Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control Adeboye, Taiyelolu January 2018 (has links) This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out. Object detection 3D reconstruction Object tracking Robot goalkeeper C++ OpenCV CUDA MATLAB Image processing Machine vision Linux OS GPU NVIDIA TX1 Stereolabs ZED Robotics Robotteknik och automation Embedded Systems Inbäddad systemteknik
50	Akcelerace heuristických metod diskrétní optimalizace na GPU / Acceleration of Discrete Optimization Heuristics Using GPU Pecháček, Václav January 2012 (has links) Thesis deals with discrete optimization problems. It focusses on faster ways to find good solutions by means of heuristics and parallel processing. Based on ant colony optimization (ACO) algorithm coupled with k-optimization local search approach, it aims at massively parallel computing on graphics processors provided by Nvidia CUDA platform. Well-known travelling salesman problem (TSP) is used as a case study. Solution is based on dividing task into subproblems using tour-based partitioning, parallel processing of distinct parts and their consecutive recombination. Provided parallel code can perform computation more than seventeen times faster than the sequential version.

Search results