• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 475
  • 88
  • 87
  • 56
  • 43
  • 21
  • 14
  • 14
  • 11
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 989
  • 321
  • 204
  • 184
  • 169
  • 165
  • 154
  • 138
  • 124
  • 104
  • 97
  • 95
  • 93
  • 88
  • 83
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
571

Real-time Rendering with Heterogeneous GPUs

Xiao Lei (8803037) 06 May 2020 (has links)
<div>Over the years, the performance demand for graphics applications has been steadily increasing. While upgrading the hardware is one direct solution, the emergence of the new low-level and low-overhead graphics APIs like Vulkan also exposed the possibility of improving rendering performance from the bottom of software implementation.</div><div><br></div><div>Most of the recent years’ middle- to high-end personal computers are equipped with both integrated and discrete GPUs. However, with previous graphics APIs, it is hard to put these two heterogeneous GPUs to work concurrently in the same application without tailored driver support.</div><div><br></div><div>This thesis provides an exploration into the utilization of such heterogeneous GPUs in real-time rendering with the help of Vulkan API. This paper first demonstrates the design and implementation details for the proposed heterogeneous GPUs working model. After that, the paper presents the test of two workload offloading strategies: offloading screen space output workload to the integrated GPU and offloading asynchronous computation workload to the integrated GPU.</div><div><br></div>While this study failed to obtain performance improvement through offloading screen space output workload, it is successful in validating that offloading asynchronous computation workload from the discrete GPU to the integrated GPU can improve the overall system performance. This study proves that it is possible to make use of the integrated and discrete GPUs concurrently in the same application with the help of Vulkan. And offloading asynchronous computation workload from the discrete GPU to the integrated GPU can provide up to 3-4% performance improvement with combinations like UHD Graphics 630 + RTX 2070 Max-Q and HD Graphics 630 + GTX 1050.
572

Deep Learning with Go

Derek Leigh Stinson (8812109) 08 May 2020 (has links)
Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of supporting libraries and frameworks for model development. In this thesis, the use of Go for the development of neural network models in general and convolution neural networks is explored. The proposed study is based on a Go-CUDA implementation of neural network models called GoCuNets. This implementation is then compared to a Go-CPU deep learning implementation that takes advantage of Go's built in concurrency called ConvNetGo. A comparison of these two implementations shows a significant performance gain when using GoCuNets compared to ConvNetGo.<br>
573

Image Space Tensor Field Visualization Using a LIC-like Method

Eichelbaum, Sebastian 20 October 2017 (has links)
Tensors are of great interest to many applications in engineering and in medical imaging, but a proper analysis and visualization remains challenging. Physics-based visualization of tensor fields has proven to show the main features of symmetric second-order tensor fields, while still displaying the most important information of the data, namely the main directions in medical diffusion tensor data using texture and additional attributes using color-coding, in a continuous representation. Nevertheless, its application and usability remains limited due to its computational expensive and sensitive nature. We introduce a novel approach to compute a fabric-like texture pattern from tensor fields on arbitrary non-selfintersecting surfaces that is motivated by image space line integral convolution (LIC). Our main focus lies on regaining three-dimensionality of the data under user interaction, such as rotation and scaling. We employ a multi-pass rendering approach to estimate proper modification of the LIC noise input texture to support the three-dimensional perception during user interactions.
574

Robust Real-Time Model Predictive Control for High Degree of Freedom Soft Robots

Hyatt, Phillip Edmond 04 June 2020 (has links)
This dissertation is focused on the modeling and robust model-based control of high degree-of-freedom (DoF) systems. While most of the contributions are applicable to any difficult-to-model system, this dissertation focuses specifically on applications to large-scale soft robots because their many joints and pressures constitute a high-DoF system and their inherit softness makes them difficult to model accurately. First a joint-angle estimation and kinematic calibration method for soft robots is developed which is shown to decrease the pose prediction error at the end of a 1.5 m robot arm by about 85\%. A novel dynamic modelling approach which can be evaluated within microseconds is then formulated for continuum type soft robots. We show that deep neural networks (DNNs) can be used to approximate soft robot dynamics given training examples from physics-based models like the ones described above. We demonstrate how these machine-learning-based models can be evaluated quickly to perform a form of optimal control called model predictive control (MPC). We describe a method of control trajectory parameterization that enables MPC to be applied to systems with more DoF and with longer prediction horizons than previously possible. We show that this parameterization decreases MPC's sensitivity to model error and drastically reduces MPC solve times. A novel form of MPC is developed based on an evolutionary optimization algorithm that allows the optimization to be parallelized on a computer's graphics processing unit (GPU). We show that this evolutionary MPC (EMPC) can greatly decrease MPC solve times for high DoF systems without large performance losses, especially given a large GPU. We combine the ideas of machine learned DNN models of robot dynamics, with parameterized and parallelized MPC to obtain a nonlinear version of EMPC which can be run at higher rates and find better solutions than many state-of-the-art optimal control methods. Finally we demonstrate an adaptive form of MPC that can compensate for model error or changes in the system to be controlled. This adaptive form of MPC is shown to inherit MPC's robustness to completely unmodeled disturbances and adaptive control's ability to decrease trajectory tracking errors over time.
575

Visualisation et traitements interactifs de grilles régulières 3D haute-résolution virtualisées sur GPU. Application aux données biomédicales pour la microscopie virtuelle en environnement HPC. / Interactive visualisation and processing of high-resolution regular 3D grids virtualised on GPU. Application to biomedical data for virtual microscopy in HPC environment.

Courilleau, Nicolas 29 August 2019 (has links)
La visualisation de données est un aspect important de la recherche scientifique dans de nombreux domaines.Elle permet d'aider à comprendre les phénomènes observés voire simulés et d'en extraire des informations à des fins notamment de validations expérimentales ou tout simplement pour de la revue de projet.Nous nous intéressons dans le cadre de cette étude doctorale à la visualisation de données volumiques en imagerie médicale et biomédicale, obtenues grâce à des appareils d'acquisition générant des champs scalaires ou vectoriels représentés sous forme de grilles régulières 3D.La taille croissante des données, due à la précision grandissante des appareils d'acquisition, impose d'adapter les algorithmes de visualisation afin de pouvoir gérer de telles volumétries.De plus, les GPUs utilisés en visualisation de données volumiques, se trouvant être particulièrement adaptés à ces problématiques, disposent d'une quantité de mémoire très limitée comparée aux données à visualiser.La question se pose alors de savoir comment dissocier les unités de calculs, permettant la visualisation, de celles de stockage.Les algorithmes se basant sur le principe dit "out-of-core" sont les solutions permettant de gérer de larges ensembles de données volumiques.Dans cette thèse, nous proposons un pipeline complet permettant de visualiser et de traiter, en temps réel sur GPU, des volumes de données dépassant très largement les capacités mémoires des CPU et GPU.L'intérêt de notre pipeline provient de son approche de gestion de données "out-of-core" permettant de virtualiser la mémoire qui se trouve être particulièrement adaptée aux données volumiques.De plus, cette approche repose sur une structure d'adressage virtuel entièrement gérée et maintenue sur GPU.Nous validons notre modèle grâce à plusieurs applications de visualisation et de traitement en temps réel.Tout d'abord, nous proposons un microscope virtuel interactif permettant la visualisation 3D auto-stéréoscopique de piles d'images haute résolution.Puis nous validons l'adaptabilité de notre structure à tous types de données grâce à un microscope virtuel multimodale.Enfin, nous démontrons les capacités multi-rôles de notre structure grâce à une application de visualisation et de traitement concourant en temps réel. / Data visualisation is an essential aspect of scientific research in many fields.It helps to understand observed or even simulated phenomena and to extract information from them for purposes such as experimental validations or solely for project review.The focus given in this thesis is on the visualisation of volume data in medical and biomedical imaging.The acquisition devices used to acquire the data generate scalar or vector fields represented in the form of regular 3D grids.The increasing accuracy of the acquisition devices implies an increasing size of the volume data.Therefore, it requires to adapt the visualisation algorithms in order to be able to manage such volumes.Moreover, visualisation mostly relies on the use of GPUs because they suit well to such problematics.However, they possess a very limited amount of memory compared to the generated volume data.The question then arises as to how to dissociate the calculation units, allowing visualisation, from those of storage.Algorithms based on the so-called "out-of-core" principle are the solutions for managing large volume data sets.In this thesis, we propose a complete GPU-based pipeline allowing real-time visualisation and processing of volume data that are significantly larger than the CPU and GPU memory capacities.The pipeline interest comes from its GPU-based approach of an out-of-core addressing structure, allowing the data virtualisation, which is adequate for volume data management.We validate our approach using different real-time applications of visualisation and processing.First, we propose an interactive virtual microscope allowing 3D auto-stereoscopic visualisation of stacks of high-resolution images.Then, we verify the adaptability of our structure to all data types with a multimodal virtual microscope.Finally, we demonstrate the multi-role capabilities of our structure through a concurrent real-time visualisation and processing application.
576

Convolutional Neural Networks on FPGA and GPU on the Edge: A Comparison

Pettersson, Linus January 2020 (has links)
When asked to implement a neural network application, the decision concerning what hardware platform to use may not always be easily made. This thesis studies various relevant platforms with regards to performance, power efficiency and usability, with the purpose of providing a basis for such decisions. The hardware platforms which are studied were a GPU, an FPGA and a CPU. The project implements Convolutional Neural Networks (CNN) on the different hardware platforms using several tools and frameworks. The final implementation uses BNN-PYNQ for the implementation on the FPGA and CPU, which provided ready-to-run code and overlays for quantized CNNs and fully connected neural networks. Next, these networks are copied using TensorFlow, and optimized to FP32, FP16 and INT8 precision using TensorRT for use on the GPU. The results indicate that the FPGA outperforms the GPU with a factor of 100 for the CNN networks, and a factor of 1000 on the fully connected networks with regards to inference speed. For power efficiency, the FPGA again outperforms the GPU. The thesis concludes that for a neural network application, an FPGA is preferred if performance is a priority. However, the GPU proved to have a greater ease of use due to the many tools and frameworks available. If easy implementation and high design flexibility is a priority, a GPU is instead recommended.
577

Parallelizing Digital Signal Processing for GPU

Ekstam Ljusegren, Hannes, Jonsson, Hannes January 2020 (has links)
Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation.
578

Att procedurellt generera ett 2D landskap parallellt på GPU vs seriellt på CPU / To procedurally generate a 2D landscape in parallell on the GPU vs serially on the CPU

Wahlberg, Björn January 2019 (has links)
Procedurellt genererat innehåll, PCG,förekommer väldigt ofta i spel nu för tiden, mycket för att öka återspelbarheten i ett spel. Några populära exempel på spel som utnyttjar PCG är Terraria(2011) och Minecraft(2011). I takt med att hårdvara blir mer och mer kraftfull så ökar även kraven på spelen som utnyttjar teknikerna eftersom att det går att generera innehåll i realtid. Men finns det outnyttjat potential i grafikkortet? Trenden av ökningen av klockfrekvensen på processorer har reducerats på senare tid, för att istället ersättas av ett större antal kärnor. Här så kan parallellisering av programkod utnyttjas för att utvinna mer ur datorns hårdvara. Ett teknologi-orienterat experiment att utfördes på först en seriell CPUlösning, och sedan en parallell GPUlösning för att undersöka hur lång tid varje metod tog. Detta skedde på varierande stora kartor för att kunna fastställa om det fanns ett samband mellan storlek och tid. Genomförandet använde sig av SFML biblioteket för att implementera GPU varianten där en fragment shader användes för att utföra alla parallella uträkningar för kartgenreringen. CPU metoden använde samma tekniker som GPU metoden, fast utan någon parallellisering. Båda teknikerna validerades genom att använda SFML för att rita ut kartorna som de genererar med enkelgrafik.
579

Analýza a optimalizace GPU kernelů pomocí strojového učení / Analysing and Optimizing GPU Kernels with Machine Learning

Šťavík, Petr January 2020 (has links)
Graphics processing units (GPUs) were originally used solely for the purpose of graph- ics rendering. This changed with the introduction of technologies like CUDA that enabled to use graphics processors as any other computing device. However, writing an efficient program for GPUs, also called GPU kernel, is one of the most difficult programming disciplines. The latest research in the field suggests that these difficulties could be po- tentially mitigated with machine learning methods. One especially successful approach is based on the utilization of recurrent neural networks (RNNs) over different representa- tions of source code. In this work, we present two RNN-based solutions that are able to derive performance characteristics of a CUDA GPU kernel directly from its intermediate representation called PTX. We assess the applicability of our two methods in two GPU op- timization tasks. In heterogeneous device mapping task, our methods are able to achieve accuracies of around 82%, results that are slightly worse than the current state of the art. In a more challenging achieved occupancy task, where the goal is to correctly predict one out of ten classes, our two methods achieve accuracies above 50%. These promising results indicate great potential in additional research focused in a similar direction. 1
580

Disc : Approximative Nearest Neighbor Search using Ellipsoids for Photon Mapping on GPUs / Disc : Approximativ närmaste grannsökning med ellipsoider för fotonmappning på GPU:er

Bergholm, Marcus, Kronvall, Viktor January 2016 (has links)
Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high-performance computing for general-purpose applications. The K-Nearest Neighbors problem is widely used in applications ranging from classification to gathering of photons in the Photon Mapping algorithm. Using the euclidean distance measure when gathering photons can cause false bleeding of colors between surfaces. Ellipsoidical search boundaries for photon gathering are shown to reduce artifacts due to this false bleeding. Shifted Sorting has been found to yield high performance on GPUs while simultaneously retaining a high approximation rate. This study presents an algorithm for approximatively solving the K-Nearest Neighbors problem modified to use a distance measure creating an ellipsoidical search boundary. The ellipsoidical search boundary is used to alleviate the issue of false bleeding of colors between surfaces in Photon Mapping. The Approximative K-Nearest Neighbors algorithm presented is a modification of the Shifted Sorting algorithm. The algorithm is found to be highly parallelizable and performs to a factor of 86% queries processed per millisecond compared to a reference implementation using spherical search boundaries implied by the euclidean distance. The rate of compression from spherical to ellipsoidical search boundary is appropriately chosen in the range 3.0 to 7.0. The algorithm is found to scale well in respect to increases in both number of data points and number of query points. / Grafikprocessorer (GPU-er) har på senare tid möjliggjort högprestandaberäkningar till låga kostnader för generella applikationer. K-Nearest Neighbors problemet har vida applikationsområden, från klassifikation inom maskininlärning till insamlande av fotoner i Photon Mapping för rendering av tredimensionella scener. Användning av euklidiska avstånd vid insamling av fotoner kan leda till en felaktig bladning av färger mellan ytor. Ellipsoidiska sökområden vid fotoninsamling har visats reducera artefakter oraskade av denna typ av felaktiga färgutblandning. Shifted Sorting har visats ge hög prestanda på GPU-er utan att förlora kvalitet av approximationsgrad. Denna rapport undersöker hur den approximativa varianten av K-Nearest Neighborsalgoritmen med Shifted Sorting presterar på GPU-er med avståndsmåttet modifierat sådant att ett ellipsoidiskt sökområde bildas. Algoritmen används för att reduceras problemet av felaktig blanding av färg i Photon Mapping. Algoritmen visas vara mycket parallelliserbar och presterar till en grad av 86% behandlade sökpunkter per millisekund i jämförelse med en referensimplementation som använder sfäriska sökområden. Kompressionsgraden längs sökpunktens ytnormal väljs fördelaktligen till ett värde i intervallet 3,0 till 7,0. Algoritmen visas skala väl med avseende på både ökningar i antal data punkter och antal sökpunkter.

Page generated in 0.1759 seconds