• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 345
  • 54
  • 41
  • 39
  • 23
  • 16
  • 15
  • 13
  • 8
  • 8
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 744
  • 291
  • 279
  • 144
  • 99
  • 93
  • 90
  • 86
  • 79
  • 70
  • 65
  • 46
  • 44
  • 43
  • 38
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
181

Profilem řízené optimalizace pro instrukční vyrovnávací paměti / Profile-Guided Optimizations for Instruction Caches

Bobek, Jiří January 2015 (has links)
Instruction cache performance is very important for the overall performance of a computer. The placement of code blocks in memory can significantly affect the cache miss rate. This means that a compiler can improve the performance of a program by placing parts of code at the right addresses in memory. This work discusses several methods for collecting profile information, and describes an algorithm that uses profile information to guide code block placement. Additionally, the algorithm is added into the optimizer of the LLVM compiler, and improvements in cache performance are evaluated.
182

Analyse und Erweiterung des Paradyn Performance Tools

Arndt, Michael 12 May 2006 (has links)
Das kostenfrei erhältliche Performanz Analyse Werkzeug Paradyn wird im Hinblick auf die Tauglichkeit zur Performanzanalyse quantenmechanischer Anwendungen (konkret Abinit) untersucht. Zusätzlich wird Paradyn so erweitert, dass eine Analyse mittels vorhandener Hardwarecounter möglich ist. Da Paradyn plattformunabhängig ist werden Performance Counter Bibliotheken wie PCL oder PAPI verwendet.
183

Ghoul: A cache-friendly programming language

Temmel, Adam January 2020 (has links)
Prestanda har historiskt sett alltid varit av betydelse för nyttjandet av datorer, vilket lett till att processorutvecklare har tagit fram flera olika metoder för att klämma ut mer processorkraft från processorn. Ett av dessa koncept är processorns cacheminne, som ansvarar för att lagra data processorn förväntar sig att behöva inom en snar framtid. Om cacheminnet nyttjats väl så innebär detta att processorn can behandla data i en mycket snabbare takt, vilket direkt påverkar prestanda. På grund av detta vill utvecklare gärna skriva kod som nyttjar cacheminnet till fullo. Detta är inte alltid en enkel uppgift, då de programmeringsmönster och beteenden utvecklaren måste anpassa sig till går att anse vara klumpiga för utvecklaren. Den här studioen kommer utforska möjligheterna att sammanfoga cachevänliga programmeringskoncept med utvecklarvänlig syntax, vilket resulterar i ett programmeringsspråk som är både läsbart, skrivbart samt effektivt med hänsyn till processorns cacheminne. För att lyckas med denna uppgift har studier på mönster inom minnesåtkomst, befintliga programmeringsspråk och kompilatordesign genomförts. Slutprodukten är ett språl vid namn Ghoul som implementerar cachevänliga koncept på en syntaktisk nivå, komplett med en fungerande kompilator. Utdata från denna kompilator blev senare prestandatestad för att avgöra huruvida de koncept språket introducerar har en märkbar påverkan på prestandan av program skrivna i detta språk. Testen visade att de tidigare nämnda konceptet direkt visar ett inflytande på hastigheten data kan behandlas i språket. / Performance has historically always been of importance to computing, and as such, processor developers have brought up several different methods to squeeze out more processing power from the processor. One of these concepts is the presence of a CPU cache memory, whose responsibility is to hold data the processor expects it might use soon. To utilize the cache well means that the processor can compute data at a much higher rate, resulting in a direct impact on performance. Therefore, it follows that it is in the developer’s best interest to write code capable of utilizing the cache memory to its full extent. This is not always an easy task however, as the patterns and style of programming the developer may need to adapt to can come of as cumbersome. This study will explore the possibilities of merging cache-friendly programming concepts with a developer-friendly syntax, resulting in a language that is both readable, writeable as well as efficient in regards to the processor cache. In order to accomplish this task, studies of memory access patterns, existing programming languages and compiler design has been performed. The end product is a language called Ghoul which successfully implements cache-friendly concepts on a syntactic level, complete with a working compiler. Outputs from this compiler were later benchmarked to assert that the concepts introduced had a measurable impact on the performance of programs written in Ghoul, showing that the aforementioned syntactical concepts indeed directly influence the speed at which data can be processed.
184

Implicitní reprezentace množin / An implicit representation of sets

Lieskovský, Matej January 2020 (has links)
In our bachelor thesis, we described an implicit data structure that, given a way to maintain an implicit representation of polylogarithmic buckets, could implement all the dynamic ordered dictionary operations in logarithmic time. We now fulfill our obligation and provide a corresponding construction of implicit buckets. 1
185

Prestandaanalys av cache i webbmiljö / Performance analysis of cache in a web environment

Maatson, Mats, Denke, Joel January 2014 (has links)
Företaget Dramatify tillhandahåller en tjänst för TV- och filmproduktions-bolag att hantera kommunikation och information kring sina produktioner med hjälp av mobiler och datorer oavsett var de befinner sig. De upplever långa responstider för sin webbapplikation och gav som förslag att studera lagring av produktionsdata i cache. För att lösa problemet har en under-sökning av tidigare arbeten inom området och lämpliga tillämpningar för att implementera, testa och analysera prototyper som lagrar data i cache. Resultatet blev en prototyp för klient respektive server med strategin att lagra varje produktion styckvis i cache. För att testa prototyperna skapades en implementation av automatiska tester som lagrade mätdata om relate-rad prestanda till cache för sidanrop från webbapplikationen i olika en-heter och webbläsare. En analys av mätdata gjordes och det visade sig att klientprototypen var 32 procent snabbare och serverprototypen 21 procent snabbare jämfört med Dramatifys implementation. / Dramatify is providing TV and film production companies with a software as a service for project management. The web application is accessible from any device with a modern web browser from anywhere in the world. Dramatify were having performance issues with high latency and needed help to implement cache for maximum performance gain. To solve the problem, a research was conducted where information about suitable ap-plications to implement, test and analyze prototypes for storing data in cache. The result was two prototypes, one for the client and one for the server, for managing cache. Performance testing was done with automatic tests on multiple devices in different web browsers. The tests was collecting rele-vant data to measure the performance in conjunction to the original imple-mentation. When analyzing the collected test data, it showed that the client prototype was 32 percent faster and that the server prototype was 21 per-cent faster than the original implementation.
186

Modélisation de performance des caches basée sur l'analyse de données / A Data Driven Approach for Cache Performance Modeling

Olmos Marchant, Luis Felipe 30 May 2016 (has links)
L’Internet d’aujourd’hui a une charge de trafic de plus en plus forte à cause de la prolifération des sites de vidéo, notamment YouTube. Les serveurs Cache jouent un rôle clé pour faire face à cette demande qui croît vertigineusement. Ces serveurs sont déployés à proximité de l’utilisateur, et ils gardent dynamiquement les contenus les plus populaires via des algorithmes en ligne connus comme « politiques de cache ». Avec cette infrastructure les fournisseurs de contenu peuvent satisfaire la demande de façon efficace, en réduisant l’utilisation des ressources de réseau. Les serveurs Cache sont les briques basiques des Content Delivery Networks (CDNs), que selon Cisco fourniraient plus de 70% du trafic de vidéo en 2019.Donc, d’un point de vue opérationnel, il est très important de pouvoir estimer l’efficacité d’un serveur Cache selon la politique employée et la capacité. De manière plus spécifique, dans cette thèse nous traitons la question suivante : Combien, au minimum, doit-on investir sur un serveur cache pour avoir un niveau de performance donné?Produit d’une modélisation qui ne tient pas compte de la façon dont le catalogue de contenus évolue dans le temps, l’état de l’art de la recherche fournissait des réponses inexactes à la dernière question.Dans nos travaux, nous proposons des nouveaux modèles stochastiques, basés sur les processus ponctuels, qui permettent d’incorporer la dynamique du catalogue dans l’analyse de performance. Dans ce cadre, nous avons développé une analyse asymptotique rigoureuse pour l’estimation de la performance d’un serveur Cache pour la politique « Least Recently Used » (LRU). Nous avons validé les estimations théoriques avec longues traces de trafic Internet en proposant une méthode de maximum de vraisemblance pour l’estimation des paramètres du modèle. / The need to distribute massive quantities of multimedia content to multiple users has increased tremendously in the last decade. The current solution to this ever-growing demand are Content Delivery Networks, an application layer architecture that handle nowadays the majority of multimedia traffic. This distribution problem has also motivated the study of new solutions such as the Information Centric Networking paradigm, whose aim is to add content delivery capabilities to the network layer by decoupling data from its location. In both architectures, cache servers play a key role, allowing efficient use of network resources for content delivery. As a consequence, the study of cache performance evaluation techniques has found a new momentum in recent years.In this dissertation, we propose a framework for the performance modeling of a cache ruled by the Least Recently Used (LRU) discipline. Our framework is data-driven since, in addition to the usual mathematical analysis, we address two additional data-related problems: The first is to propose a model that a priori is both simple and representative of the essential features of the measured traffic; the second, is the estimation of the model parameters starting from traffic traces. The contributions of this thesis concerns each of the above tasks.In particular, for our first contribution, we propose a parsimonious traffic model featuring a document catalog evolving in time. We achieve this by allowing each document to be available for a limited (random) period of time. To make a sensible proposal, we apply the "semi-experimental" method to real data. These "semi-experiments" consist in two phases: first, we randomize the traffic trace to break specific dependence structures in the request sequence; secondly, we perform a simulation of an LRU cache with the randomized request sequence as input. For candidate model, we refute an independence hypothesis if the resulting hit probability curve differs significantly from the one obtained from original trace. With the insights obtained, we propose a traffic model based on the so-called Poisson cluster point processes.Our second contribution is a theoretical estimation of the cache hit probability for a generalization of the latter model. For this objective, we use the Palm distribution of the model to set up a probability space whereby a document can be singled out for the analysis. In this setting, we then obtain an integral formula for the average number of misses. Finally, by means of a scaling of system parameters, we obtain for the latter expression an asymptotic expansion for large cache size. This expansion quantifies the error of a widely used heuristic in literature known as the "Che approximation", thus justifying and extending it in the process.Our last contribution concerns the estimation of the model parameters. We tackle this problem for the simpler and widely used Independent Reference Model. By considering its parameter (a popularity distribution) to be a random sample, we implement a Maximum Likelihood method to estimate it. This method allows us to seamlessly handle the censor phenomena occurring in traces. By measuring the cache performance obtained with the resulting model, we show that this method provides a more representative model of data than typical ad-hoc methodologies.
187

Etude et évaluation de politiques d'ordonnancement temps réel multiprocesseur / Study and evaluation of real-time multiprocessor scheduling policies

Cheramy, Maxime 11 December 2014 (has links)
De multiples algorithmes ont été proposés pour traiter de l’ordonnancement de tâchestemps réel dans un contexte multiprocesseur. Encore très récemment de nouvelles politiquesont été définies. Ainsi, sans garantie d’exhaustivité, nous en avons recensé plusd’une cinquantaine. Cette grande diversité rend difficile une analyse comparée de leurscomportements et performances. L’objectif de ce travail de thèse est de permettre l’étudeet l’évaluation des principales politiques d’ordonnancement existantes. La première contributionest SimSo, un nouvel outil de simulation dédié à l’évaluation des politiques. Grâceà cet outil, nous avons pu comparer les performances d’une vingtaine d’algorithmes. Laseconde contribution est la prise en compte, dans la simulation, des surcoûts temporelsliés à l’exécution du code de l’ordonnanceur et à l’influence des mémoires caches sur la duréed’exécution des travaux par l’introduction de modèles statistiques évaluant les échecsd’accès à ces mémoires / Numerous algorithms have been proposed to address the scheduling of real-time tasksfor multiprocessor architectures. Yet, new scheduling algorithms have been defined veryrecently. Therefore, and without any guarantee of completeness, we have identified morethan fifty of them. This large diversity makes the comparison of their behavior and performancedifficult. This research aims at allowing the study and the evaluation of keyscheduling algorithms. The first contribution is SimSo, a new simulation tool dedicatedto the evaluation of scheduling algorithms. Using this tool, we were able to compare theperformance of twenty algorithms. The second contribution is the consideration, in the simulation,of temporal overheads related to the execution of the scheduler and the impactof memory caches on the computation time of the jobs. This is done by the introductionof statistical models evaluating the cache miss ratios
188

Evaluating Direct3D 12 GPU Resource Synchronization on Performance and Cache Operations

Ginola, Nadhif January 2023 (has links)
Background. Lower-level graphics programming interfaces such as Direct3D 12 re-quire synchronization and data hazards between dependent workloads to be resolvedmanually. A barrier is a primitive used to resolve synchronization and data hazardsin a manner to achieve correct behavior by allowing developers to define waits be-tween workloads. However, due to its coarse-grained interface, workloads may beredundantly blocked, and data hazards resolved conservatively leading to excessiveGPU cache flushes even with correct usage. Objectives. To evaluate if the novel and more fine-grained enhanced barriers APIin Direct3D 12 can provide any improvements over Direct3D 12 resource (legacy)barriers in AMD FidelityFX applications using Direct3D 12. Methods. An experiment as a research method was carried out to investigate theeffects of enhanced barriers in existing Direct3D 12 applications. Frame time and thenumber of GPU cache flushes and invalidations occurring per frame were the primarymetrics measured. This was carried out by replacing legacy barriers with enhancedbarriers in three of AMD’s open-source, state-of-the-art image quality toolkits; Fideli-tyFX Super Resolution (FSR), FidelityFX Super Resolution 2 (FSR2), and Stochas-tic Screen Space Reflections (SSSR). Results. The use of enhanced barriers in FSR, FSR2, and SSSR showed that nosignificant differences were found in terms of frame time and the number of cacheflushes and invalidations occurring within a frame when compared to using resourcebarriers. Configurations of enhanced barriers that may reduce pipeline stall times re-main theoretical and could not be verified due to minuscule differences. These includecompute-only workload synchronization and non-blocking barrier layout transitions. Conclusions. Replacing legacy barriers with enhanced barriers in FSR, FSR2, andSSSR proved to be feasible, but lacks performance benefits for it to be desirable.However, the use of barriers can vary depending on the application, therefore, differ-ent results can arise given other synchronization scenarios. For existing Direct3D 12applications using resource barriers, it may be advisable to not upgrade to enhancedbarriers. / Bakgrund. Grafikprogrammeringsgränssnitt på lägre nivå, som till exempel Di-rect3D 12, kräver att synkronisering och datahinder mellan beroende arbetsbelast-ningar löses manuellt. En barriär är en primitiv som används för att lösa synkroniser-ing och datahinder som tillåter korrekt beteende genom att låta utvecklare definieraväntetider mellan arbeten. Men på grund av sitt högnivå-gränssnitt kan arbetsbelast-ningar onödigt blockeras och datahinder löses konservativt med överdrivet mycketGPU-cache-flushningar även vid korrekt användning. Syfte. För att utvärdera om det nya och mer detaljerade enhanced barriers API iDirect3D 12 kan ge några förbättringar jämfört med Direct3D 12 Resource (Legacy)Barriers i AMD FidelityFX-applikationer som använder Direct3D 12. Metod. Ett experiment som en forskningsmetod genomfördes för att undersöka ef-fekterna av enhanced barriers i befintliga Direct3D 12-applikationer. Bildtid och an-talet GPU-cache-flushar och ogiltigförklaringar som inträffar per bild var de primäramätvärdena. Detta genomfördes genom att ersätta legacy barriers med enhanced bar-riers i tre av AMD:s öppen källkodsbaserade verktyg för bildkvalitet; FidelityFX Su-per Resolution (FSR), FidelityFX Super Resolution 2 (FSR2) och Stochastic ScreenSpace Reflections (SSSR). Resultat. Användningen av enhanced barriers i FSR, FSR2 och SSSR visade attinga betydande skillnader kunde hittas när det gäller bildtid och antalet cache-flusharoch ogiltigförklaringar som inträffar inom en bild jämfört med användning av resourcebarriers. Konfigurationer av enhanced barrier som kan minska pipeline-stopp tiderförblir teoretiska och kunde inte verifieras på grund av minimala skillnader. Dessainkluderar synkronisering av endast compute-shader arbete och icke-blockerandeövergångar av barriär-layout. Slutsatser. Att ersätta legacy barriers med enhanced barriers i FSR, FSR2 ochSSSR visade sig vara genomförbart, men saknar prestandafördelar för att vara ön-skvärt. Dock kan användningen av barriärer variera beroende på applikationen, ochdärför kan olika resultat uppstå med olika synkroniseringsscenarier. För befintligaDirect3D 12-applikationer som använder resursbarriärer kan det vara lämpligt attinte uppgradera till förbättrade barriärer.
189

Cache Characterization and Performance Studies Using Locality Surfaces

Sorenson, Elizabeth Schreiner 14 July 2005 (has links) (PDF)
Today's processors commonly use caches to help overcome the disparity between processor and main memory speeds. Due to the principle of locality, most of the processor's requests for data are satisfied by the fast cache memory, resulting in a signficant performance improvement. Methods for evaluating workloads and caches in terms of locality are valuable for cache design. In this dissertation, we present a locality surface which displays both temporal and spatial locality on one three-dimensional graph. We provide a solid, mathematical description of locality data and equations for visualization. We then use the locality surface to examine the locality of a variety of workloads from the SPEC CPU 2000 benchmark suite. These surfaces contain a number of features that represent sequential runs, loops, temporal locality, striding, and other patterns from the input trace. The locality surface can also be used to evaluate methodologies that involve locality. For example, we evaluate six synthetic trace generation methods and find that none of them accurately reproduce an original trace's locality. We then combine a mathematical description of caches with our locality definition to create cache characterization surfaces. These new surfaces visually relate how references with varying degrees of locality function in a given cache. We examine how varying the cache size, line size, and associativity affect a cache's response to different types of locality. We formally prove that the locality surface can predict the miss rate in some types of caches. Our locality surface matches well with cache simulation results, particularly caches with large associativities. We can qualitatively choose prudent values for cache and line size. Further, the locality surface can predict the miss rate with 100% accuracy for some fully associative caches and with some error for set associative caches. One drawback to the locality surface is the time intensity of the stack-based algorithm. We provide a new parallel algorithm that reduces the computation time significantly. With this improvement, the locality surface becomes a viable and valuable tool for characterizing workloads and caches, predicting cache simulation results, and evaluating any procedure involving locality.
190

Location Cache Design and Performance Analysis for Chip Multiprocessors

NEMETH, JASON 19 September 2008 (has links)
No description available.

Page generated in 0.0801 seconds