Spelling suggestions: "subject:"bperformance profiling"" "subject:"deperformance profiling""
1 |
A Compiler Framework to Support and Exploit Heterogeneous Overlapping-ISA Multiprocessor PlatformsJelesnianski, Christopher Stanisław 15 December 2015 (has links)
As the demand for ever increasingly powerful machines continues, new architectures are sought to be the next route of breaking past the brick wall that currently stagnates the performance growth of modern multi-core CPUs. Due to physical limitations, scaling single-core performance any further is no longer possible, giving rise to modern multi-cores. However, the brick wall is now limiting the scaling of general-purpose multi-cores. Heterogeneous-core CPUs have the potential to continue scaling by reducing power consumption through exploitation of specialized and simple cores within the same chip.
Heterogeneous-core CPUs join fundamentally different processors each which their own peculiar features, i.e., fast execution time, improved power efficiency, etc; enabling the building of versatile computing systems. To make heterogeneous platforms permeate the computer market, the next hurdle to overcome is the ability to provide a familiar programming model and environment such that developers do not have to focus on platform details. Nevertheless, heterogeneous platforms integrate processors with diverse characteristics and potentially a different Instruction Set Architecture (ISA), which exacerbate the complexity of the software. A brave few have begun to tread down the heterogeneous-ISA path, hoping to prove that this avenue will yield the next generation of super computers. However, many unforeseen obstacles have yet to be discovered.
With this new challenge comes the clear need for efficient, developer-friendly, adaptable system software to support the efforts of making heterogeneous-ISA the golden standard for future high-performance and general-purpose computing. To foster rapid development of this technology, it is imperative to put the proper tools into the hands of developers, such as application and architecture profiling engines, in order to realize the best heterogeneous-ISA platform possible with available technology. In addition, it would be in the best interest to create tools to be as "timeless" as possible to expose fundamental concepts industry could benefit from and adopt in future designs.
We demonstrate the feasibility of a compiler framework and runtime for an existing heterogeneous-ISA operating system (Popcorn Linux) for automatically scheduling compute blocks within an application on a given heterogeneous-ISA high-performance platform (in our case a platform built with Intel Xeon - Xeon Phi). With the introduced Profiler, Partitioner, and Runtime support, we prove to be able to automatically exploit the heterogeneity in an overlapping-ISA platform, being faster than native execution and other parallelism programming models.
Empirically evaluating our compiler framework, we show that application execution on Popcorn Linux can be up to 52% faster than the most performant native execution for Xeon or Xeon Phi. Using our compiler framework relieves the developer from manual scheduling and porting of applications, requiring only a single profiling run per application. / Master of Science
|
2 |
Kernel optimization by layout restructuring / Estimation d'efficacité et restructuration automatisées de noyaux de calculHaine, Christopher 03 July 2017 (has links)
Bien penser la structuration de données est primordial pour obtenir de hautes performances, alors que les processeurs actuels perdent un temps considérable à attendre la complétion de transactions mémoires. En particulier les localités spatiales et temporelles de données doivent être optimisées.Cependant, les transformations de structures de données ne sont pas proprement explorées par les compilateurs, en raison de la difficulté que pose l'évaluation de performance des transformations potentielles. De plus,l'optimisation des structures de données est chronophage, sujette à erreur etles transformations à considérer sont trop nombreuses pour être implémentées à la main dans l'optique de trouver une version de code efficace.On propose de guider les programmeurs à travers le processus de restructuration de données grace à un retour utilisateur approfondi, tout d'abord en donnant une description multidimensionnelle de la structure de donnée initiale, faite par une analyse de traces mémoire issues du binaire de l'application de l'utilisateur, dans le but de localiser des problèmes de stride au niveau instruction, indépendemment du langage d'entrée. On choisit de focaliser notre étude sur les transformations de structure de données, traduisibles dans un formalisme proche du C pour favoriser la compréhension de l'utilisateur, que l'on applique et évalue sur deux cas d'étude qui sont des applications réelles,à savoir une simulation d'ondes cardiaques et une simulation de chromodynamique quantique sur réseau, avec différents jeux d'entrées. La prédiction de performance de différentes transformations est conforme à 5% près aux versions réécrites à la main. / Careful data layout design is crucial for achieving high performance, as nowadays processors waste a considerable amount of time being stalled by memory transactions, and in particular spacial and temporal locality have to be optimized. However, data layout transformations is an area left largely unexplored by state-of-the-art compilers, due to the difficulty to evaluate the possible performance gains of transformations. Moreover, optimizing data layout is time-consuming, error-prone, and layout transformations are too numerous tobe experimented by hand in hope to discover a high performance version. We propose to guide application programmers through data layout restructuring with an extensive feedback, firstly by providing a comprehensive multidimensional description of the initial layout, built via analysis of memory traces collected from the application binary textit {in fine} aiming at pinpointing problematic strides at the instruction level, independently of theinput language. We choose to focus on layout transformations,translatable to C-formalism to aid user understanding, that we apply and assesson case study composed of two representative multithreaded real-lifeapplications, a cardiac wave simulation and lattice QCD simulation, with different inputs and parameters. The performance prediction of different transformations matches (within 5%) with hand-optimized layout code.
|
3 |
Performance-cost trade-offs in heterogeneous clouds / Compromis performance-coût dans les clouds hétérogènesIordache, Ancuta 09 September 2016 (has links)
Les infrastructures de cloud fournissent une grande variété de ressources de calcul à la demande avec différents compromis coût-performance. Cela donne aux utilisateurs des nombreuses opportunités pour exécuter leurs applications ayant des besoins complexes en ressources, à partir d’un grand nombre de serveurs avec des interconnexions à faible latence jusqu’à des dispositifs spécialisés comme des GPUs et des FPGAs. Les besoins des utilisateurs concernant l’exécution de leurs applications peuvent varier entre une exécution la plus rapide possible, la plus chère ou un compromis entre les deux. Cependant, le choix du nombre et du type des ressources à utiliser pour obtenir le compromis coût-performance que les utilisateurs exigent constitue un défi majeur. Cette thèse propose trois contributions avec l’objectif de fournir des bons compromis coût-performance pour l’exécution des applications sur des plates-formes hétérogènes. Elles suivent deux directions : un bon usage des ressources et un bon choix des ressources. Nous proposons comme première contribution une méthode de partage pour des accélérateurs de type FPGA dans l’objectif de maximiser leur utilisation. Dans une seconde contribution, nous proposons des méthodes de profilage pour la modélisation de la demande en ressources des applications. Enfin, nous démontrons comment ces technologies peuvent être intégrées dans une plate-forme de cloud hétérogène. / Cloud infrastructures provide on-demand access to a large variety of computing devices with different performance and cost. This creates many opportunities for cloud users to run applications having complex resource requirements, starting from large numbers of servers with low-latency interconnects, to specialized devices such as GPUs and FPGAs. User expectations regarding the execution of applications may vary between the fastest possible execution, the cheapest execution or any trade-off between the two extremes. However, enabling cloud users to easily make performance-cost trade-offs is not a trivial exercise and choosing the right amount and type of resources to run applications accordingto user expectations is very difficult. This thesis proposes three contributions to enable performance-cost trade-offs for application execution in heterogeneous clouds by following two directions: make good use of resources and make good choice of resources. We propose as a first contribution a method to share FPGA-based accelerators in cloud infrastructures having the objective to improve their utilization. As a second contribution we propose profiling methods to automate the selection of heterogeneous resources for executing applications under user objectives. Finally, we demonstrate how these technologies can be implemented and exploited in heterogeneous cloud platforms.
|
4 |
Improve game performance tracking tools : Heatmap as a tool / Förbättra prestandaspårningsverktyg : Färgdiagram för visualisering av prestandaWessman, Niklas January 2022 (has links)
Software testing is a crucial development technique to capture defects and slow code. When testing 3D graphics, it is hard to create automatic tests that detect errors or slow performance. Finding performance issues in game maps is a complex task that requires much manual work. Gaming companies such as EA DICE could benefit from automating the process of finding these performance issues in their game maps. This thesis tries to solve the problem by creating automatic tests where the camera is placed in a top-down perspective and flies over the in-game map, recording the time it takes to create render and client simulation frames for each map segment. The resulting trace is then visualised as a heatmap, where the mean frame creation times are rendered with pseudo colouring techniques to help pinpoint possible issues for the test engineers. The key findings of this thesis are that a heatmap visualisation of frame creation times saves much time for the developers trying to find these issues; it also lowers the amount of knowledge needed to find performance issues. This tool automates a process that formerly needed considerable manual work to get the same result. Now, artists with low coding experience can find performance issues without the technical knowledge of a Quality Assurance engineer. The thesis also highlights the drawbacks of a top-down perspective of camera trace since this is not how EA DICE games are usually rendered for the player in runtime. With this thesis as a base, other tests could be made with other ways of moving the camera and visualising the trace. / Mjukvarutestning är en viktig programvaruutvecklings teknik för att fånga felaktig eller långsam kod. Det är svårt att skapa automatiska tester för 3D grafik som hittar fel eller dålig prestanda i koden. Att hitta prestandaproblem i spelkartor är en komplex uppgift som kräver mycket manuellt arbete. Spelföretag såsom EA DICE skulle dra fördel av att automatisera processen att hitta dessa prestandaproblem i spelkartor. Denna uppsats försöker lösa detta genom att skapa automatiska tester där kameran placeras i ett uppifrån-och-ned-perspektiv och sedan flyger genom banan i spelet samtidigt som den samlar in data på hur lång tid det tar för renderings-bildrutor och klient-simulerings-bildrutor att skapas för varje ban-segment. Den resulterande datan är därefter visualiserade som ett färgdiagram, där medelvärdet på tiden för att skapa varje bildruta ritas upp med en psuedofärgningsteknik för att markera möjliga problemområden för testingenjörerna. Nyckelupptäckter för denna uppsats är att färgdiagramsvisualiseringen av bildruta-skapande-tider sparar mycket tid för utvecklare som försöker hitta prestandaproblem. Det minskar också kunskapströskeln som behövs för att lokalisera prestandaproblem. Detta verktyg automatiserar en process som tidigare krävde omfattande manuellt arbete för att få samma resultat. Numera kan game artists med låg koderfarenhet hitta dessa prestandaproblem utan den tekniska kunskapen hos en kvalitetskontroll-ingenjör. Den här uppsatsen visar också nackdelar med ett uppifrån-och-ned-perspektiv för kameran då det inte är så EA DICE spel normalt renderas för spelarna. Den här uppsatsen kan användas som utgångspunkt för andra som vill utveckla testverktyg och med fördel ta i beaktning de utvecklingspunkter denna uppsats belyser.
|
5 |
Characterizing applications by integrating andimproving tools for data locality analysis and programperformanceSingh, Saurabh 21 September 2017 (has links)
No description available.
|
6 |
Rozšíření systému pro zákonné odposlechy / Additions to Lawful Interception SystemHranický, Radek January 2014 (has links)
As a part of the Modern Tools for Detection and Mitigation of Cyber Criminality on the New Generation Internet project, a Lawful Interception System was developed. This thesis describes additions to the system, which provide a capability to intercept application protocols (eg. an e-mail communication) directly in a network of an Internet service provider. This new functionality enables automatic detection and filtering of a related TCP transfer. It is also able to handle situations, in which the identity (an IP address) of a target user is not known yet, or when it is difficult to detect it (NAT is in progress, user is at an Internet café, behind the firewall, etc.). One of the most important requirements for the developed prototype is the ability of a fast packet proccessing with maximum throughput and minimal packet loss. Therefore, this thesis also consists of a performance profiling, an identification of critical points and their optimalization.
|
Page generated in 0.0967 seconds