Global ETD Search

1	Systèmes de communications multi-utilisateurs : de la gestion d'interférence au codage réseau / Multi-user communication systems : from interference management to network coding Mejri, Asma 13 December 2013 (has links) Ce travail est dédié à l'analyse, la conception et l'évaluation des performances des schémas de codage de réseaux pour les systèmes des communications multi-terminaux. Nous étudions en premier lieu le protocole « compute-and-forward » dans le canal à accès multiples. Nous proposons un critère de construction de codes de réseaux efficaces pour cette stratégie basé sur la résolution d'un problème du vecteur le plus court d'un réseau de points. En addition, nous développons de nouveaux algorithmes de décodage prouvés numériquement plus performants que le décodeur existant du CF. La deuxième partie concerne l'implémentation du protocole CF dans le canal à relais bidirectionnel et le canal à sources et relais multiples. Nous développons des algorithmes de construction de schémas de codage pour le CF et évaluons théoriquement et par simulations numériques leurs performances. La dernière partie concerne le canal MIMO distribué et en particulier une nouvelle architecture de décodeurs « integer forcing » inspirés par le protocole CF. Nous proposons de nouveaux algorithmes de construction des paramètres optimaux de ces décodeurs et montrons par simulations qu'ils apportent un gain significatif par rapport aux récepteurs linéaires existants / This work is dedicated to the analysis, design and performance evaluation of physical layer network coding strategies in multiuser communication systems. The first part is devoted to study the compute-and-forward protocol in the basic multiple access channel. For this strategy, we propose an optimal solution to design efficient network codes based on solving a lattice shortest vector problem. Moreover, we derive novel bounds on the ergodic rate and the outage probability for the CF operating in fast and slow fading channels respectively. Besides, we develop novel decoding algorithms proved numerically to outperform the traditional decoding scheme for the CF. The second part is dedicated to the design and end-to-end performance evaluation of network codes for the CF and the analog network coding in the two-way relay channel and the multi-source multi-relay channel. For each network model we study the decoding at the relay nodes and the end destination, propose search algorithms for optimal network codes for the CF and evaluate, theoretically and numerically, the end-to-end error rate and achievable transmission rate. In the last part, we study new decoders for the distributed MIMO channel termed integer forcing (if). Inspired by the CF, if receivers take advantage of the interference provided by the wireless medium to decode integer linear combinations of the original codewords. We develop in our work efficient algorithms to select optimal if receivers parameters allowing to outperform existing suboptimal linear receivers Protocole Compute-and-Forward Compute-and-Forward protocol
2	Enhanced Capabilities of the Spike Algorithm and a New Spike-OpenMP Solver Spring, Braegan S 07 November 2014 (has links) (PDF) SPIKE is a parallel algorithm to solve block tridiagonal matrices. In this work, two useful improvements to the algorithm are proposed. A flexible threading strategy is developed, to overcome limitations of the recursive reduced system method. Allo- cating multiple threads to some tasks created by the SPIKE algorithm removes the previous restriction that recursive SPIKE may only use a number of threads equal to a power of two. Additionally, a method of solving transpose problems is shown. This method matches the performance of the non-transpose solve while reusing the original factorization. Compute High Performance Compute Algorithms Parallelism Banded Linear System
3	Particle Simulation using Asynchronous Compute : A Study of The Hardware Enarsson, Kim January 2020 (has links) Background. With the introduction of the compute shader, followed by the application programming interface (API) DirectX 12, the modern GPU is now going through a transformation. Previously the GPU was used as a massive computational tool for running a single task at unparalleled speed. The compute shader made it possible to run CPU like programs on the GPU, DirectX 12 takes this even further by introducing a multi-engine architecture. Multi-engine architecture unlocks the possibility of running the compute shader alongside the regular graphical stages, this concept is called asynchronous compute. Objectives. This thesis aims to investigate if asynchronous compute can be used to increase the performance of particle simulations. The key metrics being studied are total frame time, rendered frames per second, and overlap time. The frst two are used to determine if asynchronous compute improves performance or not, while the last is used to determine if the particle simulation is running asynchronous compute or not.Methods. For this thesis, the particle simulation used is the N-body particle simulation.The N-body particle simulation is implemented using a compute shader and is part of a larger DirectX 12 framework. One application is implemented that run two different execution models, one is the standard sequential execution model and one is the asynchronous compute model. The main difference between the two execution models is that the sequential execution model will be using only one command queue, this being a 3D command queue. The asynchronous compute model will be running a separate compute command queue alongside the 3D command queue. The performance metrics being studied are all collected using a custom-built GPU profiler. Results. The results indicate that it is possible to increase the performance of particle simulations using asynchronous compute. The registered performance gain reaches as high as 34% on hardware that supports asynchronous compute while hardware that according to NVIDIA does not support asynchronous compute registered performance gains up towards 11%. In terms of overlap time between the compute workload and the graphical workload, the AMD GPU showed an overlap time that matched the frame time. However, NVIDIA GPUs did not show the expected overlap time. Conclusions. It can be determined that asynchronous compute provide benefits when compared to the sequential execution model, it can be used to increase the performance of particle simulations. However, since the research in this thesis only made use of a single particle simulation, more work needs to be done, for example, work to test if the performance gain can be improved even further using different methods like, workload pairing or utilizing multiple GPUs, however that kind of work requires the use of a larger-scale application that consists of multiple different tasks other than just a single particle simulation. / Bakgrund. I och med Introduktionen av compute shadern, tätt följd av DirectX12, så genomgår den moderna GPUn en förvandling. Tidigare användes GPUn som ett massivt uträkningsverktyg ämnat att utföra en enda uppgift med en enastående hastighet. Compute shadern gjorde det möjligt at köra CPU liknande program på GPUn, DirectX 12 tar detta ett steg längre genom att introducera en multi-engine arkitektur. Denna arkitektur låser upp möjligheten att köra compute shadern samtidigt som de vanliga grafiska shader stadigerna, detta konceptet kallas asynchronous compute.Syfte. Syftet med denna avhandling är att undersöka om asynchronous compute kan användas för att öka prestandan på en partikel simulering. Den viktigaste data som kommer studeras är den totala frame tiden, antalet renderade frames varje sekund och överlapp tiden. Den totala frame tiden och antalet renderade frames varje sekund används för att bestämma om asynchronous compute faktiskt ökar prestandan eller inte, medan överlapp tiden används för att bestämma om partikel simuleringen kör asynchronous compute eller inte.Metod. Partikel simuleringen som används i denna avhandling är en N-body partikel simulering. N-body partikel simuleringen är implementerad i en compute shader och är en del av en större DirectX 12 applikation. En applikation implementeras som kör två olika exekverings modeller, den ena är den vanliga sekventiella exekverings modellen och den andra är asynchronous compute modellen. Den primära skillnaden mellan exekverings modellerna är att den sekventiella exekverings modellen bara använder sig av en kommando kö, vilken är en 3D kommando kö. Asynchronous compute modellen kommer använda sig av en separat compute kommando kö tillsammans med 3D kommando kön. Den metriska datan samlas in med hjälp av enegen byggd GPU profilerare.Resultat. Resultatet indikerar att det är möjligt att öka prestandan hos en partikelsimulering som använder sig av asynchronous compute. Den registrerade prestandaökningen når så högt som till 34% på hårdvara som stödjer asynchronous compute, medan hårdvara som inte stödjer asynchronous compute registrerade en prestandaökning upp till 11%. När det kommer till överlapp tiden mellan compute delen och den grafiska delen så visar GPUn från AMD en överlapp tid som matchar frame tiden. När det kommer till GPUerna från NVIDIA så visade dessa inte en förväntad överlapp tid.Slutsatser. Det kan fastställas att asynchronous compute har vissa fördelar jämfört med den sekventiella exekverings modellen. Asynchronous compute kan användas för att öka prestanda hos partikel simuleringar, men eftersom undersökningen i denna avhandling bara använder en enda partikel simulering så krävs ännu mera forskning. Exempelvis forskning som undersöker om prestanda ökningen kan bli ännu bättre, genom att applicera olika metoder som workload pairing och användingen av fera GPUer, detta krväver också att en större application för testing används, som består av fera olika typer av simuleringar och inte bara en enda partikel simuleing. Asynchronous Compute Particle Simulation Multi-Engine DirectX Async Compute Asynchronous Compute Partikel Simulering Multi-Engine DirectX Async Compute Övrig annan teknik
4	Comparison of Technologies for General-Purpose Computing on Graphics Processing Units Sörman, Torbjörn January 2016 (has links) The computational capacity of graphics cards for general-purpose computinghave progressed fast over the last decade. A major reason is computational heavycomputer games, where standard of performance and high quality graphics constantlyrise. Another reason is better suitable technologies for programming thegraphics cards. Combined, the product is high raw performance devices andmeans to access that performance. This thesis investigates some of the currenttechnologies for general-purpose computing on graphics processing units. Technologiesare primarily compared by means of benchmarking performance andsecondarily by factors concerning programming and implementation. The choiceof technology can have a large impact on performance. The benchmark applicationfound the difference in execution time of the fastest technology, CUDA, comparedto the slowest, OpenCL, to be twice a factor of two. The benchmark applicationalso found out that the older technologies, OpenGL and DirectX, are competitivewith CUDA and OpenCL in terms of resulting raw performance. gpgpu gpu benchmark cuda opencl directcompute opengl compute shader
5	Volume rendering with Marching cubes and async compute Tlatlik, Max Lukas January 2019 (has links) With the addition of the compute shader stage for GPGPU hardware it has becomepossible to run CPU like programs on modern GPU hardware. The greatest benefit can be seen for algorithms that are of highly parallel nature and in the case of volume rendering the Marching cubes algorithm makes for a great candidate due to its simplicity and parallel nature. For this thesis the Marching cubes algorithm was implemented on a compute shader and used in a DirectX 12 framework to determine if GPU frametime performance can be improved by executing the compute command queue parallell to the graphics command queue. Results from performance benchmarks show that a gain is present for each benchmarked configuration and the largest gains are seen for smaller workloads with up to 52%. This information could therefore prove useful for game developers who want to improve framerates or decrease development time but also in other fields such as volume rendering for medical images. Volume rendering async compute multi-engine Computer Sciences Datavetenskap (datalogi)
6	A Method to Symbolically Compute Convolution Integrals Peasgood, Richard January 2009 (has links) This thesis presents a method for computing symbolic solutions of a certain class of improper integrals related to convolutions of Mellin transforms. Important integrals that fall into this category are integral transforms such as the Fourier, Laplace, and Hankel transforms. The method originated in a presentation by Salvy, However, many of the details of the method were absent. We present the method of Salvy in full which computes a linear homogeneous differentail equation which is satisfied by the integral in question. A theory of contour integrals is introduced that is related to the contour definition of Meijer G functions. This theory is used to prove the correctness of the method of Salvy and also gives a way to compute regions of validity for the solutions computed. We then extend the method to compute symbolic solutions of the integral along with where the solutions are valid. Computer Science Integration Mellin Transform Symbolic Compute Algebra Computer Science
7	A Method to Symbolically Compute Convolution Integrals Peasgood, Richard January 2009 (has links) This thesis presents a method for computing symbolic solutions of a certain class of improper integrals related to convolutions of Mellin transforms. Important integrals that fall into this category are integral transforms such as the Fourier, Laplace, and Hankel transforms. The method originated in a presentation by Salvy, However, many of the details of the method were absent. We present the method of Salvy in full which computes a linear homogeneous differentail equation which is satisfied by the integral in question. A theory of contour integrals is introduced that is related to the contour definition of Meijer G functions. This theory is used to prove the correctness of the method of Salvy and also gives a way to compute regions of validity for the solutions computed. We then extend the method to compute symbolic solutions of the integral along with where the solutions are valid. Computer Science Integration Mellin Transform Symbolic Compute Algebra Computer Science
8	The Implementation and Applications of Multi-pattern Matching Algorithm over General Purpose GPU Cheng, Yan-Hui 08 July 2011 (has links) With the current technology more and more developed, in our daily life, whether doing research or work, we often use a variety of computer equipment to help us deal with some of our frequently used data. And the type and quantity of data have become more and more, such as satellite imaging data, genetic engineering, the global climate forecasting data, and complex event processing, etc. Some certain types of the data require both accuracy and timeliness. That is, we hope to look for some data in a shorter time. According to MIT Technology Review in August 2010 reported that the relevant published, complex event processing becomes a new research, and it also includes in the part of data search. Data search often means data comparing. Given specified keywords or key information which we are looking for, we design a pattern matching algorithm to find the results within a shorter time, or even real-time. In our research, the purpose is to use the general-purpose GPU, NVIDIA Tesla C2050, with parallel computing architecture to implement parallelism of the pattern matching. Finally, we construct a service to handle a large number of real-time data. We also make some performance tests and compare the results with the well-known software ¡§Apache Solr¡¨ to find the differences and the possible application in the future. real-time GPU parallel compute Solr pattern matching
9	Improving Network Reliability: Analysis, Methodology, and Algorithms Booker, Graham B. 2010 May 1900 (has links) The reliability of networking and communication systems is vital for the nation's economy and security. Optical and cellular networks have become a critical infrastructure and are indispensable in emergency situations. This dissertation outlines methods for analyzing such infrastructures in the presence of catastrophic failures, such as a hurricane, as well as accidental failures of one or more components. Additionally, it presents a method for protecting against the loss of a single link in a multicast network along with a technique that enables wireless clients to efficiently recover lost data sent by their source through collaborative information exchange. Analysis of a network's reliability during a natural disaster can be assessed by simulating the conditions in which it is expected to perform. This dissertation conducts the analysis of a cellular infrastructure in the aftermath of a hurricane through Monte-Carlo sampling and presents alternative topologies which reduce resulting loss of calls. While previous research on restoration mechanisms for large-scale networks has mostly focused on handling the failures of single network elements, this dissertation examines the sampling methods used for simulating multiple failures. We present a quick method of nding a lower bound on a network's data loss through enumeration of possible cuts as well as an efficient method of nding a tighter lower bound through genetic algorithms leveraging the niching technique. Mitigation of data losses in a multicast network can be achieved by adding redundancy and employing advanced coding techniques. By using Maximum Rank Distance (MRD) codes at the source, a provider can create a parity packet which is e ectively linearly independent from the source packets such that all packets may be transmitted through the network using the network coding technique. This allows all sinks to recover all of the original data even with the failure of an edge within the network. Furthermore, this dissertation presents a method that allows a group of wireless clients to cooperatively recover from erasures (e.g., due to failures) by using the index coding techniques. Compute Networking Network Coding Genetic Algorithms Maximum Rank Distance Codes
10	Bewertung der Compute-Leistung von Workstations mit SPEC-CPU Benchmarks Mund, Carsten 29 July 1996 (has links) (PDF) Nach einer Einfürung in SPEC und deren Berwertungs- verfahren wird die Art und Weise der SPEC-Leistungsmessungen eingehender beleuchtet. Der Hauptteil beinhaltet Durchführung sowie Auswertung von SPEC-Benchmarks an 5 Workstations. Die dabei gewonnenen Ergebnisse werden mit den offiziell verbreiteten SPEC-Werten verglichen und diskutiert. SPECrate_fp95 SPECrate_int95 CFP95 Compute-Leistung ddc:004 Benchmark

Search results