Global ETD Search

661	Otimização de multidões em jogos digitais utilizando CUDA Bardella, Tiago Ungaro 19 October 2015 (has links) Made available in DSpace on 2016-03-15T19:38:03Z (GMT). No. of bitstreams: 1 TIAGO UNGARO BARDELLA.pdf: 2553991 bytes, checksum: f8e6ba33f7c930ee81f6b64116f495ff (MD5) Previous issue date: 2015-10-19 / The history of digital games shows, since the beginning, games which uses many types of enemy models to confront and many types of characters to control, like Real-Time Strategy games, for example. These huge amount of models into an important scene are called crowds. The crowds needs a high computer performance and specific algorithms in their interaction control to avoid immersion loss into a game by problems which may happen if the crowds are not treated accordingly. With the popularization of graphic board languages like NVIDIA CUDA, new algorithms were created to easily increase the performance of crowds in digital games and their overwhelming superiority compared to the methods used in linear programming were proved in many researches. The goal of this work is to use these GPU techniques as base to implement a new API using CUDA language that will present better performance and simplicity compared to the others algorithms on the area of crowds in digital games. After the project conclusion, the created API turned easier the crowd treatment to digital game developers using Unity3D integrated with API TBX, that now only need to include a DLL in the project instead creating na algorithm for crowd treatment from the beginning, which takes a huge amount of time from development. / O histórico dos jogos digitais apresenta, desde seu princípio, jogos que utilizam diversos modelos de inimigos para enfrentar ou diversos modelos de personagens para controlar, como os jogos Real-Time Strategy por exemplo. Essas grandes quantidades de modelos que compõem uma cena importante são chamadas de multidões. As multidões necessitam de um alto poder computacional e algoritmos específicos para seu tratamento para evitar a perda de imersão dentro de um jogo pelos problemas que podem acontecer caso as multidões não sejam tratadas adequadamente. Com o surgimento de linguagens de placas gráficas como a NVIDIA CUDA, novos algoritmos foram criados para melhor trabalhar com o desempenho de multidões em jogos digitais e sua superioridade em comparação com os métodos utilizados em programação sequencial foi comprovada em diversos estudos. O objetivo deste trabalho é se basear nestas técnicas de GPU para implementar uma nova API usando tecnologia CUDA que visa melhorar os algoritmos existentes para tratamento de multidões em jogos digitais em termos de desempenho e simplicidade de implementação. Com a conclusão do projeto, a API criada facilitou o tratamento de multidões para desenvolvedores de jogos digitais com a game engine Unity3D integrada com a API TBX de simulação de multidões, que agora apenas precisam incluir uma DLL em seu projeto ao invés de criar um algoritmo próprio de tratamento de multidões do início, o que demanda tempo de desenvolvimento. multidões virtuais jogos digitais GPU (Graphics Processing Unit) Unity3D TBX (Techbizxccelerator) virtual crowds digital games GPU (Graphics Processing Unit) Unity3D TBX (Techbizxccelerator) CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
662	A Runtime Framework for Regular and Irregular Message-Driven Parallel Applications on GPU Systems Rengasamy, Vasudevan January 2014 (has links) (PDF) The effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing data transfer between CPU and GPU, increasing occupancy of GPU kernels, overlapping data transfers with computations, reducing GPU idling and kernel optimizations. Overcoming these challenges require considerable effort on the part of the application developers. Most optimization strategies are often proposed and tuned specifically for individual applications. Message-driven executions with over-decomposition of tasks constitute an important model for parallel programming and provide multiple benefits including communication-computation overlap and reduced idling on resources. Charm++ is one such message-driven language which employs over decomposition of tasks, computation-communication overlap and a measurement-based load balancer to achieve high CPU utilization. This research has developed an adaptive runtime framework for efficient executions of Charm++ message-driven parallel applications on GPU systems. In the first part of our research, we have developed a runtime framework, G-Charm with the focus primarily on optimizing regular applications. At runtime, G-Charm automatically combines multiple small GPU tasks into a single larger kernel which reduces the number of kernel invocations while improving CUDA occupancy. G-Charm also enables reuse of existing data in GPU global memory, performs GPU memory management and dynamic scheduling of tasks across CPU and GPU in order to reduce idle time. In order to combine the partial results obtained from the computations performed on CPU and GPU, G-Charm allows the user to specify an operator using which the partial results are combined at runtime. We also perform compile time code generation to reduce programming overhead. For Cholesky factorization, a regular parallel application, G-Charm provides 14% improvement over a highly tuned implementation. In the second part of our research, we extended our runtime to overcome the challenges presented by irregular applications such as a periodic generation of tasks, irregular memory access patterns and varying workloads during application execution. We developed models for deciding the number of tasks that can be combined into a kernel based on the rate of task generation, and the GPU occupancy of the tasks. For irregular applications, data reuse results in uncoalesced GPU memory access. We evaluated the effect of altering the global memory access pattern in improving coalesced access. We’ve also developed adaptive methods for hybrid execution on CPU and GPU wherein we consider the varying workloads while scheduling tasks across the CPU and GPU. We demonstrate that our dynamic strategies result in 8-38% reduction in execution times for an N-body simulation application and a molecular dynamics application over the corresponding static strategies that are amenable for regular applications. Graphics Processing Unit (GPU) Parallel Programming (Computer Science) Parallel Programming Models Parallel Programming Frameworks Charm++ (Computer Program Language) HybridAPI-GPU Management Framework G-Charm Framework Accelerator Based Computing Cholesky Factorization Computer Science
663	Trådlösa Nätverk : säkerhet och GPU de Laval, johnny January 2009 (has links) Trådlosa nätverk är av naturen sårbara for avlyssning för att kommunikationen sker med radiovagor. Därfor skyddas trådlosa nätverk med kryptering. WEP var den första krypteringsstandarden som användes av en bredare publik som senare visade sig innehålla flera sårbarheter. Följden blev att krypteringen kunde förbigås på ett par minuter. Därför utvecklades WPA som ett svar till sårbarheterna i WEP. Kort därefter kom WPA2 som är den standard som används i nutid. Den svaghet som kan påvisas med WPA2 finns hos WPA2-PSK när svaga lösenord används. Mjukvaror kan med enkelhet gå igenom stora uppslagsverk för att testa om lösenord går att återställa. Det är en process som tar tid och som därför skyddar nätverken i viss mån. Dock har grafikprocessorer börjat användas i syfte för att återställa lösenord. Grafikkorten är effektivare och återställer svaga lösenord betydligt snabbare än moderkortens processorer. Det öppnar upp for att jämföra lösenord med ännu större uppslagsverk och fler kombinationer. Det är vad denna studie avser att belysa; hur har grafikkortens effektivitet påverkat säkerheten i trådlösa nätverk ur ett verksamhetsperspektiv. / Wireless networks are inherently vulnerable for eavesdropping since they use radio waves to communicate. Wireless networks are therefore protected by encryption. WEP was the first encryption standard that was widely used. Unfortunately WEP proved to have several serious vulnerabilities. WEP could be circumvented within few minutes. Therefore WPA was developed as a response to the weak WEP. Shortly thereafter WPA2 was released and are now being used in present. The only weakness with WPA2 is in the subset WPA2-PSK when weak passwords are being used. Software could easily go through large dictionaries to verify if a password could be recovered. But that is time consuming and therefore providing wireless networks limited protection. However a new area of use with advanced graphic cards has showed that it is providing a faster way of recovering passwords than the ordinary processor on the motherboard. That opens up for the larger use of dictionaries and the processing of words or combinations of words. That is what this study aims to shed light on. How the efficiency of the graphic cards have affected security in wireless networks from a corporate perspective of view. Wireless network GPU CPU graphic cards security processor WEP WPA WPA2 Elcomsoft Trådlösa nätverk GPU CPU grafikkort processorer säkerhet WEP WPA WPA2 Elcomsoft Information Systems Information Systems
664	Cooperative Execution of Opencl Programs on Multiple Heterogeneous Devices Pandit, Prasanna Vasant January 2013 (has links) (PDF) Computing systems have become heterogeneous with the increasing prevalence of multi-core CPUs, Graphics Processing Units (GPU) and other accelerators in them. OpenCL has emerged as an attractive programming framework for heterogeneous systems. However, utilizing mul- tiple devices in OpenCL is a challenge as it requires the programmer to explicitly map data and computation to each device. Utilizing multiple devices simultaneously to speed up execu- tion of a kernel is even more complex, as the relative execution time of the kernel on different devices can vary signiﬁcantly. Also, after each kernel execution, a coherent version of the data needs to be established. This means that, in order to utilize all devices effectively, the programmer has to spend considerable time and effort to distribute work across all devices, keep track of modiﬁed data in these devices and correctly perform a merging step to put the data together. Further, the relative performance of a program may vary across different inputs, which means a statically determined work distribution may not work well. In this work, we present FluidiCL, an OpenCL runtime that takes a program written for a single device and uses multiple heterogeneous devices to execute each kernel. The runtime performs dynamic work distribution and cooperatively executes each kernel on all available devices. Since we consider a setup with devices having discrete address spaces, our solution ensures that execution of OpenCL work-groups on devices is adjusted by taking into account the overheads for data management. The data transfers and data merging needed to ensure coherence are handled transparently without requiring any effort from the programmer. Flu- idiCL also does not require prior training or proﬁling and is completely portable across dif- ferent machines. Because it is dynamic, the runtime is able to adapt to system load. We have developed several optimizations for improving the performance of FluidiCL. We evaluate the runtime across different sets of devices. On a machine with an Intel quad-core processor and an NVidia Fermi GPU, FluidiCL shows a geomean speedup of nearly 64% over the GPU, 88% over the CPU and 14% over the best of the two devices in each benchmark. In all benchmarks, performance of our runtime comes to within 13% of the best of the two devices. FluidiCL shows similar results on a machine with a quad-core CPU and an NVidia Kepler GPU, with up to 26% speedup over the best of the two. We also present results considering an Intel Xeon Phi accelerator and a CPU and ﬁnd that FluidiCL performs up to 45% faster than the best of the two devices. We extend FluidiCL from a CPU–GPU scenario to a three-device setup hav- ing a quad-core CPU, an NVidia Kepler GPU and an Intel Xeon Phi accelerator and ﬁnd that FluidiCL obtains a geomean improvement of 6% in kernel execution time over the best of the three devices considered in each case. Heterogeneous Computers Open Computing Language FluidiCL Fluidic Kernels OpenCL Application Programming Interface Graphics Processing Unit (GPU) Central Processing Unit (CPU) Computer Architecture FluidiCL Runtime Heterogeneous OpenCL Runtime OpenCL Programs CPU–GPU Systems Computer Engineering
665	Schémas numériques adaptés aux accélérateurs multicoeurs pour les écoulements bifluides / Numerical simulations of two-fluid flow on multicores accelerator Jung, Jonathan 28 October 2013 (has links) Cette thèse traite de la modélisation et de l'approximation numérique des écoulements liquide-gaz compressibles. La difficulté centrale est la modélisation et l'approximation de l'interface liquide-gaz. Le modèle bifluide est constitué d'un système de lois de conservation fermé par une loi d'état du mélange. La loi d'état conditionne les bonnes propriétés (hyperbolicité, existence d'une entropie de Lax) du système. Les schémas classiques de type Godunov conduisent à des imprécisions les rendant inutilisables en pratique. L'existence de solutions discontinues rend difficile la construction de schémas d'ordre élevé et nécessite des maillages très fins pour une précision acceptable. Il est indispensable de proposer des algorithmes performants pour les calculateurs parallèles les plus récents. Nous aborderons chacune de ces problématiques: construction d'une "bonne" loi de pression, construction de schémas numériques adaptés, programmation sur calculateur massivement multicoeur. / This thesis deals with the modeling and numerical approximation of compressible gas-liquid flows. The main difficulty lies in modeling and approximation of the liquid-gas interface. The two-fluid model is a system of conservation laws closed with a mixture pressure law. The law has to be chosen carefully, it conditions good properties of the system as hyperbolicity or existence of a Lax entropy. Classic conservative Godunov-type schemes lead to inaccuracies that make them unusable inpractice. The existence of discontinuous solutions makes it difficult to build high order schemes and requires very fine meshes to an acceptable accuracy. It is therefore essential to provide efficient algorithms for the High Performance Computing. In this thesis, we will partially treat each of these issues : construction of a "good" pressure law, building adapted numerical schemes, programming on GPU or GPU cluster. Écoulements bifluides Schéma Lagrange-projection Schéma ALE-projection Projection aléatoire Ensemble d'hyperbolicité non convexe Entropie de mélange OpenCL GPU MPI Two-fluid flow Lagrange-projection scheme ALE-projection scheme Random sampling Non convex hyperbolic set Mixture entropy OpenCL GPU MPI 532.5 530.15 620
666	Résolution numérique de l'opérateur de gyromoyenne, schémas d'advection et couplage : applications à l'équation de Vlasov / Numerical methods for the gyroaverage operator, advection schemes and coupling : applications to the Vlasov equation Steiner, Christophe 11 December 2014 (has links) Cette thèse propose et analyse des méthodes numériques pour la résolution de l'équation de Vlasov. Cette équation modélise l'évolution d'une espèce de particules chargées sous l'effet d'un champ électromagnétique. La première partie est consacrée à une analyse mathématique de schémas semi-Lagrangiens résolvant l'équation de transport linéaire qui constituent la brique de base des méthodes de splitting directionnel.Des méthodes de résolution de l'équation de Vlasov couplée à l'équation de Poisson, dans le cas où uniquement le champ électrique est considéré, sont optimisées dans la seconde partie. Il s'agit d'optimisation en temps de calcul par l'utilisation de cartes graphiques (GPU) et l'utilisation d'un maillage non homogène.Dans la troisième et dernière partie, nous étudierons une méthode numérique de calcul de l'opérateur de gyromoyenne intervenant dans la théorie gyrocinétique que nous appliquerons à l'équation de quasi-neutralité. / This thesis proposes and analyzes numerical methods for solving the Vlasov equation. This equation models the evolution of a species of charged particles under the effet of an electromagnetic field. The first part is devoted to a mathematical analysis of semi-Lagrangian schemes solving the linear transport equation which is the basic building block of directional splitting methods.Solving methods for the Vlasov equation coupled to the Poisson equation, in the case where only the electric field is considered, are optimized in the second part. This optimization relates to the time of calculation by the use of Graphics Processing Unit (GPU) and the use of an inhomogeneous mesh.In the third and final part, we study a numerical method for calculating the gyroaverage operator involved in gyrokinetic theory. This method will be applied to solve the quasi-neutrality equation. Equation de Vlasov Méthodes semi-Lagrangiennes Equations équivalentes Superconvergence GPU Gyromoyenne Equation de quasi-neutralité Modèle gyrocinétique Vlasov equation Semi-Lagrangian methods Equivalent equations Superconvergence GPU Gyrokinetic model Gyroaverage Quasi-neutrality equation 515 518 533.7
667	探索類神經網路於網路流量異常偵測中的時效性需求 / Exploring the timeliness requirement of artificial neural networks in network traffic anomaly detection 連茂棋, Lian, Mao-Ci Unknown Date (has links) 雲端的盛行使得人們做任何事都要透過網路，但是總會有些有心人士使用一些惡意程式來創造攻擊或通過網絡連接竊取資料。為了防止這些網路惡意攻擊，我們必須不斷檢查網路流量資料，然而現在這個雲端時代，網路的資料是非常龐大且複雜，若要檢查所有網路資料不僅耗時而且非常沒有效率。本研究使用TensorFlow與多個圖形處理器(Graphics Processing Unit, GPU)來實作類神經網路(Artificial Neural Networks, ANN)機制，用以分析網路流量資料，並得到一個可以判斷正常與異常網路流量的偵測規則，也設計一個實驗來驗證我們提出的類神經網路機制是否符合網路流向異常偵測的時效性和有效性。在實驗過程中，我們發現使用更多的GPU可以減少訓練類神經網路的時間，並且在我們的實驗設計中使用三個GPU進行運算可以達到網路流量異常偵測的時效性。透過該方法得到的初步實驗結果，我們提出機制的結果優於使用反向傳播算法訓練類神經網路得到的結果。 / The prosperity of the cloud makes people do anything through the Internet, but there are people with bad intention to use some malicious programs to create attacks or steal information through the network connection. In order to prevent these cyber-attacks, we have to keep checking the network traffic information. However, in the current cloud environment, the network information is huge and complex that to check all the information is not only time-consuming but also inefficient. This study uses TensorFlow with multiple Graphic Processing Units (GPUs) to implement an Artificial Neural Networks (ANN) mechanism to analyze network traffic data and derive detection rules that can identify normal and malicious traffics, and we call it Network Traffic Anomaly Detection (NTAD). Experiments are also designed to verify the timeliness and effectiveness of the derived ANN mechanism. During the experiment, we found that using more GPUs can reduce training time, and using three GPUs to do the operation can meet the timeliness in NTAD. As a result of this method, the experiment result was better than ANN with back propagation mechanism. 網路流量異常偵測機器學習 GPU平行運算類神經網絡張量流 Network traffic anomaly detection Machine learning GPU parallel operation Artificial neural networks TensorFlow
668	Modèles de représentation multi-résolution pour le rendu photo-réaliste de matériaux complexes Baril, Jérôme 11 January 2010 (has links) The emergence of digital capture devices have enabled the developmentof 3D acquisition to scan the properties of a real object : its shape and itsappearance. This process provides a dense and accurate representation of realobjects and allows to avoid the costly process of physical simulation to modelan object. Thus, the issues have evolved and are no longer focus on modelingthe characteristics of a real object only but on the treatment of data fromacquisition to integrate a copy of reality in a process of image synthesis. In this thesis, we propose new representations for appearance functions from the acquisition with the aim of defining a set of multicale models of low complexity in size working in real time on the today's graphics hardware / L'émergence des périphériques de capture numériques ont permis le développement de l'acquisition 3D pour numériser les propriétés d'un objet réel : sa forme et son apparence. Ce processus fournit une représentation dense et précise d'objets réels et permet de s'abstraire d'un processus des imulation physique coûteux pour modéliser un objet. Ainsi, les problématiquesont évolué et portent non plus uniquement sur la modélisation descaractéristiques d'un objet réel mais sur les traitements de données issues de l'acquisition pour intégrer une copie de la réalité dans un processus de synthèse d'images. Dans ces travaux de thèse, nous proposons de nouvelles représentations pour les fonctions d'apparence issues de l'acquisition dont le but est de définir un ensemble de modèles multi-échelles, de faible complexité en taille, capable d'e^tre visualisé en temps réel sur le matériel graphique actuel. Informatique graphique Synthèse d'images Rendu Matériaux Rendu temps-réel Apparence Multi-résolution Ondelettes Gpu Brdf Svbrdf Btf Ptm Approximation Computer graphic Image synthesis Rendering Material Real time rendering Appearance Multi-resolution Wavelets Gpu Brdf Svbrdf Btf Ptm Approximation
669	Solving incompressible Navier-Stokes equations on heterogeneous parallel architectures / Résolution des équations de Navier-Stokes incompressibles sur architectures parallèles hétérogènes Wang, Yushan 09 April 2015 (has links) Dans cette thèse, nous présentons notre travail de recherche dans le domaine du calcul haute performance en mécanique des fluides. Avec la demande croissante de simulations à haute résolution, il est devenu important de développer des solveurs numériques pouvant tirer parti des architectures récentes comprenant des processeurs multi-cœurs et des accélérateurs. Nous nous proposons dans cette thèse de développer un solveur efficace pour la résolution sur architectures hétérogènes CPU/GPU des équations de Navier-Stokes (NS) relatives aux écoulements 3D de fluides incompressibles.Tout d'abord nous présentons un aperçu de la mécanique des fluides avec les équations de NS pour fluides incompressibles et nous présentons les méthodes numériques existantes. Nous décrivons ensuite le modèle mathématique, et la méthode numérique choisie qui repose sur une technique de prédiction-projection incrémentale.Nous obtenons une distribution équilibrée de la charge de calcul en utilisant une méthode de décomposition de domaines. Une parallélisation à deux niveaux combinée avec de la vectorisation SIMD est utilisée dans notre implémentation pour exploiter au mieux les capacités des machines multi-cœurs. Des expérimentations numériques sur différentes architectures parallèles montrent que notre solveur NS obtient des performances satisfaisantes et un bon passage à l'échelle.Pour améliorer encore la performance de notre solveur NS, nous intégrons le calcul sur GPU pour accélérer les tâches les plus coûteuses en temps de calcul. Le solveur qui en résulte peut être configuré et exécuté sur diverses architectures hétérogènes en spécifiant le nombre de processus MPI, de threads, et de GPUs.Nous incluons également dans ce manuscrit des résultats de simulations numériques pour des benchmarks conçus à partir de cas tests physiques réels. Les résultats obtenus par notre solveur sont comparés avec des résultats de référence. Notre solveur a vocation à être intégré dans une future bibliothèque de mécanique des fluides pour le calcul sur architectures parallèles CPU/GPU. / In this PhD thesis, we present our research in the domain of high performance software for computational fluid dynamics (CFD). With the increasing demand of high-resolution simulations, there is a need of numerical solvers that can fully take advantage of current manycore accelerated parallel architectures. In this thesis we focus more specifically on developing an efficient parallel solver for 3D incompressible Navier-Stokes (NS) equations on heterogeneous CPU/GPU architectures. We first present an overview of the CFD domain along with the NS equations for incompressible fluid flows and existing numerical methods. We describe the mathematical model and the numerical method that we chose, based on an incremental prediction-projection method.A balanced distribution of the computational workload is obtained by using a domain decomposition method. A two-level parallelization combined with SIMD vectorization is used in our implementation to take advantage of the current distributed multicore machines. Numerical experiments on various parallel architectures show that this solver provides satisfying performance and good scalability.In order to further improve the performance of the NS solver, we integrate GPU computing to accelerate the most time-consuming tasks. The resulting solver can be configured for running on various heterogeneous architectures by specifying explicitly the numbers of MPI processes, threads and GPUs. This thesis manuscript also includes simulation results for two benchmarks designed from real physical cases. The computed solutions are compared with existing reference results. The code developed in this work will be the base for a future CFD library for parallel CPU/GPU computations. Équations de Navier-Stokes Méthode de prédiction-projection Calcul haute performance Parallélisation multi-niveaux Calcul sur GPU Navier-Stokes equations Prediction-projection method Helmholtz solver Poisson solver High performance computing Multi-level parallelization GPU computing
670	Sketch-based intuitive 3D model deformations Bao, Xin January 2014 (has links) In 3D modelling software, deformations are used to add, to remove, or to modify geometric features of existing 3D models to create new models with similar but slightly different details. Traditional techniques for deforming virtual 3D models require users to explicitly define control points and regions of interest (ROIs), and to define precisely how to deform ROIs using control points. The awkwardness of defining these factors in traditional 3D modelling software makes it difficult for people with limited experience of 3D modelling to deform existing 3D models as they expect. As applications which require virtual 3D model processing become more and more widespread, it becomes increasingly desirable to lower the "difficulty of use" threshold of 3D model deformations for users. This thesis argues that the user experience, in terms of intuitiveness and ease of use, of a user interface for deforming virtual 3D models, can be greatly enhanced by employing sketch-based 3D model deformation techniques, which require the minimal quantities of interactions, while keeping the plausibility of the results of deformations as well as the responsiveness of the algorithms, based on modern home grade computing devices. A prototype system for sketch-based 3D model deformations is developed and implemented to support this hypothesis, which allows the user to perform a deformation using a single deforming stroke, eliminating the need to explicitly select control points, the ROI and the deforming operation. GPU based accelerations have been employed to optimise the runtime performance of the system, so that the system is responsive enough for real-time interactions. The studies of the runtime performance and the usability of the prototype system are conducted to provide evidence to support the hypothesis. 006.6

Search results