Global ETD Search

181	Analyse und Erweiterung des Paradyn Performance Tools Arndt, Michael 12 May 2006 (has links) Das kostenfrei erhältliche Performanz Analyse Werkzeug Paradyn wird im Hinblick auf die Tauglichkeit zur Performanzanalyse quantenmechanischer Anwendungen (konkret Abinit) untersucht. Zusätzlich wird Paradyn so erweitert, dass eine Analyse mittels vorhandener Hardwarecounter möglich ist. Da Paradyn plattformunabhängig ist werden Performance Counter Bibliotheken wie PCL oder PAPI verwendet. info:eu-repo/classification/ddc/004 ddc:004 Cache-Speicher Instrumentation Cache Misses Dynamische Instrumentierung Paradyn
182	Ghoul: A cache-friendly programming language Temmel, Adam January 2020 (has links) Prestanda har historiskt sett alltid varit av betydelse för nyttjandet av datorer, vilket lett till att processorutvecklare har tagit fram flera olika metoder för att klämma ut mer processorkraft från processorn. Ett av dessa koncept är processorns cacheminne, som ansvarar för att lagra data processorn förväntar sig att behöva inom en snar framtid. Om cacheminnet nyttjats väl så innebär detta att processorn can behandla data i en mycket snabbare takt, vilket direkt påverkar prestanda. På grund av detta vill utvecklare gärna skriva kod som nyttjar cacheminnet till fullo. Detta är inte alltid en enkel uppgift, då de programmeringsmönster och beteenden utvecklaren måste anpassa sig till går att anse vara klumpiga för utvecklaren. Den här studioen kommer utforska möjligheterna att sammanfoga cachevänliga programmeringskoncept med utvecklarvänlig syntax, vilket resulterar i ett programmeringsspråk som är både läsbart, skrivbart samt effektivt med hänsyn till processorns cacheminne. För att lyckas med denna uppgift har studier på mönster inom minnesåtkomst, befintliga programmeringsspråk och kompilatordesign genomförts. Slutprodukten är ett språl vid namn Ghoul som implementerar cachevänliga koncept på en syntaktisk nivå, komplett med en fungerande kompilator. Utdata från denna kompilator blev senare prestandatestad för att avgöra huruvida de koncept språket introducerar har en märkbar påverkan på prestandan av program skrivna i detta språk. Testen visade att de tidigare nämnda konceptet direkt visar ett inflytande på hastigheten data kan behandlas i språket. / Performance has historically always been of importance to computing, and as such, processor developers have brought up several different methods to squeeze out more processing power from the processor. One of these concepts is the presence of a CPU cache memory, whose responsibility is to hold data the processor expects it might use soon. To utilize the cache well means that the processor can compute data at a much higher rate, resulting in a direct impact on performance. Therefore, it follows that it is in the developer’s best interest to write code capable of utilizing the cache memory to its full extent. This is not always an easy task however, as the patterns and style of programming the developer may need to adapt to can come of as cumbersome. This study will explore the possibilities of merging cache-friendly programming concepts with a developer-friendly syntax, resulting in a language that is both readable, writeable as well as efficient in regards to the processor cache. In order to accomplish this task, studies of memory access patterns, existing programming languages and compiler design has been performed. The end product is a language called Ghoul which successfully implements cache-friendly concepts on a syntactic level, complete with a working compiler. Outputs from this compiler were later benchmarked to assert that the concepts introduced had a measurable impact on the performance of programs written in Ghoul, showing that the aforementioned syntactical concepts indeed directly influence the speed at which data can be processed. CPU cache Language design Performance Ghoul. CPU cache Språkdesign Prestanda Ghoul Software Engineering Programvaruteknik
183	Implicitní reprezentace množin / An implicit representation of sets Lieskovský, Matej January 2020 (has links) In our bachelor thesis, we described an implicit data structure that, given a way to maintain an implicit representation of polylogarithmic buckets, could implement all the dynamic ordered dictionary operations in logarithmic time. We now fulfill our obligation and provide a corresponding construction of implicit buckets. 1
184	Prestandaanalys av cache i webbmiljö / Performance analysis of cache in a web environment Maatson, Mats, Denke, Joel January 2014 (has links) Företaget Dramatify tillhandahåller en tjänst för TV- och filmproduktions-bolag att hantera kommunikation och information kring sina produktioner med hjälp av mobiler och datorer oavsett var de befinner sig. De upplever långa responstider för sin webbapplikation och gav som förslag att studera lagring av produktionsdata i cache. För att lösa problemet har en under-sökning av tidigare arbeten inom området och lämpliga tillämpningar för att implementera, testa och analysera prototyper som lagrar data i cache. Resultatet blev en prototyp för klient respektive server med strategin att lagra varje produktion styckvis i cache. För att testa prototyperna skapades en implementation av automatiska tester som lagrade mätdata om relate-rad prestanda till cache för sidanrop från webbapplikationen i olika en-heter och webbläsare. En analys av mätdata gjordes och det visade sig att klientprototypen var 32 procent snabbare och serverprototypen 21 procent snabbare jämfört med Dramatifys implementation. / Dramatify is providing TV and film production companies with a software as a service for project management. The web application is accessible from any device with a modern web browser from anywhere in the world. Dramatify were having performance issues with high latency and needed help to implement cache for maximum performance gain. To solve the problem, a research was conducted where information about suitable ap-plications to implement, test and analyze prototypes for storing data in cache. The result was two prototypes, one for the client and one for the server, for managing cache. Performance testing was done with automatic tests on multiple devices in different web browsers. The tests was collecting rele-vant data to measure the performance in conjunction to the original imple-mentation. When analyzing the collected test data, it showed that the client prototype was 32 percent faster and that the server prototype was 21 per-cent faster than the original implementation. cache performance web application Memcached localForage cache prestanda webbapplikation Memcached localForage Software Engineering Programvaruteknik
185	Modélisation de performance des caches basée sur l'analyse de données / A Data Driven Approach for Cache Performance Modeling Olmos Marchant, Luis Felipe 30 May 2016 (has links) L’Internet d’aujourd’hui a une charge de trafic de plus en plus forte à cause de la prolifération des sites de vidéo, notamment YouTube. Les serveurs Cache jouent un rôle clé pour faire face à cette demande qui croît vertigineusement. Ces serveurs sont déployés à proximité de l’utilisateur, et ils gardent dynamiquement les contenus les plus populaires via des algorithmes en ligne connus comme « politiques de cache ». Avec cette infrastructure les fournisseurs de contenu peuvent satisfaire la demande de façon efficace, en réduisant l’utilisation des ressources de réseau. Les serveurs Cache sont les briques basiques des Content Delivery Networks (CDNs), que selon Cisco fourniraient plus de 70% du trafic de vidéo en 2019.Donc, d’un point de vue opérationnel, il est très important de pouvoir estimer l’efficacité d’un serveur Cache selon la politique employée et la capacité. De manière plus spécifique, dans cette thèse nous traitons la question suivante : Combien, au minimum, doit-on investir sur un serveur cache pour avoir un niveau de performance donné?Produit d’une modélisation qui ne tient pas compte de la façon dont le catalogue de contenus évolue dans le temps, l’état de l’art de la recherche fournissait des réponses inexactes à la dernière question.Dans nos travaux, nous proposons des nouveaux modèles stochastiques, basés sur les processus ponctuels, qui permettent d’incorporer la dynamique du catalogue dans l’analyse de performance. Dans ce cadre, nous avons développé une analyse asymptotique rigoureuse pour l’estimation de la performance d’un serveur Cache pour la politique « Least Recently Used » (LRU). Nous avons validé les estimations théoriques avec longues traces de trafic Internet en proposant une méthode de maximum de vraisemblance pour l’estimation des paramètres du modèle. / The need to distribute massive quantities of multimedia content to multiple users has increased tremendously in the last decade. The current solution to this ever-growing demand are Content Delivery Networks, an application layer architecture that handle nowadays the majority of multimedia traffic. This distribution problem has also motivated the study of new solutions such as the Information Centric Networking paradigm, whose aim is to add content delivery capabilities to the network layer by decoupling data from its location. In both architectures, cache servers play a key role, allowing efficient use of network resources for content delivery. As a consequence, the study of cache performance evaluation techniques has found a new momentum in recent years.In this dissertation, we propose a framework for the performance modeling of a cache ruled by the Least Recently Used (LRU) discipline. Our framework is data-driven since, in addition to the usual mathematical analysis, we address two additional data-related problems: The first is to propose a model that a priori is both simple and representative of the essential features of the measured traffic; the second, is the estimation of the model parameters starting from traffic traces. The contributions of this thesis concerns each of the above tasks.In particular, for our first contribution, we propose a parsimonious traffic model featuring a document catalog evolving in time. We achieve this by allowing each document to be available for a limited (random) period of time. To make a sensible proposal, we apply the "semi-experimental" method to real data. These "semi-experiments" consist in two phases: first, we randomize the traffic trace to break specific dependence structures in the request sequence; secondly, we perform a simulation of an LRU cache with the randomized request sequence as input. For candidate model, we refute an independence hypothesis if the resulting hit probability curve differs significantly from the one obtained from original trace. With the insights obtained, we propose a traffic model based on the so-called Poisson cluster point processes.Our second contribution is a theoretical estimation of the cache hit probability for a generalization of the latter model. For this objective, we use the Palm distribution of the model to set up a probability space whereby a document can be singled out for the analysis. In this setting, we then obtain an integral formula for the average number of misses. Finally, by means of a scaling of system parameters, we obtain for the latter expression an asymptotic expansion for large cache size. This expansion quantifies the error of a widely used heuristic in literature known as the "Che approximation", thus justifying and extending it in the process.Our last contribution concerns the estimation of the model parameters. We tackle this problem for the simpler and widely used Independent Reference Model. By considering its parameter (a popularity distribution) to be a random sample, we implement a Maximum Likelihood method to estimate it. This method allows us to seamlessly handle the censor phenomena occurring in traces. By measuring the cache performance obtained with the resulting model, we show that this method provides a more representative model of data than typical ad-hoc methodologies. Evaluation de performance Probabilités Statistique Réseaux Cache Performance evaluation Probability Statistics Networks Cache
186	Etude et évaluation de politiques d'ordonnancement temps réel multiprocesseur / Study and evaluation of real-time multiprocessor scheduling policies Cheramy, Maxime 11 December 2014 (has links) De multiples algorithmes ont été proposés pour traiter de l’ordonnancement de tâchestemps réel dans un contexte multiprocesseur. Encore très récemment de nouvelles politiquesont été définies. Ainsi, sans garantie d’exhaustivité, nous en avons recensé plusd’une cinquantaine. Cette grande diversité rend difficile une analyse comparée de leurscomportements et performances. L’objectif de ce travail de thèse est de permettre l’étudeet l’évaluation des principales politiques d’ordonnancement existantes. La première contributionest SimSo, un nouvel outil de simulation dédié à l’évaluation des politiques. Grâceà cet outil, nous avons pu comparer les performances d’une vingtaine d’algorithmes. Laseconde contribution est la prise en compte, dans la simulation, des surcoûts temporelsliés à l’exécution du code de l’ordonnanceur et à l’influence des mémoires caches sur la duréed’exécution des travaux par l’introduction de modèles statistiques évaluant les échecsd’accès à ces mémoires / Numerous algorithms have been proposed to address the scheduling of real-time tasksfor multiprocessor architectures. Yet, new scheduling algorithms have been defined veryrecently. Therefore, and without any guarantee of completeness, we have identified morethan fifty of them. This large diversity makes the comparison of their behavior and performancedifficult. This research aims at allowing the study and the evaluation of keyscheduling algorithms. The first contribution is SimSo, a new simulation tool dedicatedto the evaluation of scheduling algorithms. Using this tool, we were able to compare theperformance of twenty algorithms. The second contribution is the consideration, in the simulation,of temporal overheads related to the execution of the scheduler and the impactof memory caches on the computation time of the jobs. This is done by the introductionof statistical models evaluating the cache miss ratios Ordonnancement Multiprocesseur Multicoeur Évaluation Cache Temps réel Scheduling Real-time Multiprocessor Multicore Evaluation Cache 004.33
187	Evaluating Direct3D 12 GPU Resource Synchronization on Performance and Cache Operations Ginola, Nadhif January 2023 (has links) Background. Lower-level graphics programming interfaces such as Direct3D 12 re-quire synchronization and data hazards between dependent workloads to be resolvedmanually. A barrier is a primitive used to resolve synchronization and data hazardsin a manner to achieve correct behavior by allowing developers to define waits be-tween workloads. However, due to its coarse-grained interface, workloads may beredundantly blocked, and data hazards resolved conservatively leading to excessiveGPU cache flushes even with correct usage. Objectives. To evaluate if the novel and more fine-grained enhanced barriers APIin Direct3D 12 can provide any improvements over Direct3D 12 resource (legacy)barriers in AMD FidelityFX applications using Direct3D 12. Methods. An experiment as a research method was carried out to investigate theeffects of enhanced barriers in existing Direct3D 12 applications. Frame time and thenumber of GPU cache flushes and invalidations occurring per frame were the primarymetrics measured. This was carried out by replacing legacy barriers with enhancedbarriers in three of AMD’s open-source, state-of-the-art image quality toolkits; Fideli-tyFX Super Resolution (FSR), FidelityFX Super Resolution 2 (FSR2), and Stochas-tic Screen Space Reflections (SSSR). Results. The use of enhanced barriers in FSR, FSR2, and SSSR showed that nosignificant differences were found in terms of frame time and the number of cacheflushes and invalidations occurring within a frame when compared to using resourcebarriers. Configurations of enhanced barriers that may reduce pipeline stall times re-main theoretical and could not be verified due to minuscule differences. These includecompute-only workload synchronization and non-blocking barrier layout transitions. Conclusions. Replacing legacy barriers with enhanced barriers in FSR, FSR2, andSSSR proved to be feasible, but lacks performance benefits for it to be desirable.However, the use of barriers can vary depending on the application, therefore, differ-ent results can arise given other synchronization scenarios. For existing Direct3D 12applications using resource barriers, it may be advisable to not upgrade to enhancedbarriers. / Bakgrund. Grafikprogrammeringsgränssnitt på lägre nivå, som till exempel Di-rect3D 12, kräver att synkronisering och datahinder mellan beroende arbetsbelast-ningar löses manuellt. En barriär är en primitiv som används för att lösa synkroniser-ing och datahinder som tillåter korrekt beteende genom att låta utvecklare definieraväntetider mellan arbeten. Men på grund av sitt högnivå-gränssnitt kan arbetsbelast-ningar onödigt blockeras och datahinder löses konservativt med överdrivet mycketGPU-cache-flushningar även vid korrekt användning. Syfte. För att utvärdera om det nya och mer detaljerade enhanced barriers API iDirect3D 12 kan ge några förbättringar jämfört med Direct3D 12 Resource (Legacy)Barriers i AMD FidelityFX-applikationer som använder Direct3D 12. Metod. Ett experiment som en forskningsmetod genomfördes för att undersöka ef-fekterna av enhanced barriers i befintliga Direct3D 12-applikationer. Bildtid och an-talet GPU-cache-flushar och ogiltigförklaringar som inträffar per bild var de primäramätvärdena. Detta genomfördes genom att ersätta legacy barriers med enhanced bar-riers i tre av AMD:s öppen källkodsbaserade verktyg för bildkvalitet; FidelityFX Su-per Resolution (FSR), FidelityFX Super Resolution 2 (FSR2) och Stochastic ScreenSpace Reflections (SSSR). Resultat. Användningen av enhanced barriers i FSR, FSR2 och SSSR visade attinga betydande skillnader kunde hittas när det gäller bildtid och antalet cache-flusharoch ogiltigförklaringar som inträffar inom en bild jämfört med användning av resourcebarriers. Konfigurationer av enhanced barrier som kan minska pipeline-stopp tiderförblir teoretiska och kunde inte verifieras på grund av minimala skillnader. Dessainkluderar synkronisering av endast compute-shader arbete och icke-blockerandeövergångar av barriär-layout. Slutsatser. Att ersätta legacy barriers med enhanced barriers i FSR, FSR2 ochSSSR visade sig vara genomförbart, men saknar prestandafördelar för att vara ön-skvärt. Dock kan användningen av barriärer variera beroende på applikationen, ochdärför kan olika resultat uppstå med olika synkroniseringsscenarier. För befintligaDirect3D 12-applikationer som använder resursbarriärer kan det vara lämpligt attinte uppgradera till förbättrade barriärer. Direct3D 12 Barrier Cache Data Hazard Synchronization Direct3D 12 Barriärer Cache Datarisk Synkronisering Computer Systems Datorsystem
188	Cache Characterization and Performance Studies Using Locality Surfaces Sorenson, Elizabeth Schreiner 14 July 2005 (has links) (PDF) Today's processors commonly use caches to help overcome the disparity between processor and main memory speeds. Due to the principle of locality, most of the processor's requests for data are satisfied by the fast cache memory, resulting in a signficant performance improvement. Methods for evaluating workloads and caches in terms of locality are valuable for cache design. In this dissertation, we present a locality surface which displays both temporal and spatial locality on one three-dimensional graph. We provide a solid, mathematical description of locality data and equations for visualization. We then use the locality surface to examine the locality of a variety of workloads from the SPEC CPU 2000 benchmark suite. These surfaces contain a number of features that represent sequential runs, loops, temporal locality, striding, and other patterns from the input trace. The locality surface can also be used to evaluate methodologies that involve locality. For example, we evaluate six synthetic trace generation methods and find that none of them accurately reproduce an original trace's locality. We then combine a mathematical description of caches with our locality definition to create cache characterization surfaces. These new surfaces visually relate how references with varying degrees of locality function in a given cache. We examine how varying the cache size, line size, and associativity affect a cache's response to different types of locality. We formally prove that the locality surface can predict the miss rate in some types of caches. Our locality surface matches well with cache simulation results, particularly caches with large associativities. We can qualitatively choose prudent values for cache and line size. Further, the locality surface can predict the miss rate with 100% accuracy for some fully associative caches and with some error for set associative caches. One drawback to the locality surface is the time intensity of the stack-based algorithm. We provide a new parallel algorithm that reduces the computation time significantly. With this improvement, the locality surface becomes a viable and valuable tool for characterizing workloads and caches, predicting cache simulation results, and evaluating any procedure involving locality. locality surface locality cache characterization surface synthetic workloads cache performance Computer Sciences
189	Location Cache Design and Performance Analysis for Chip Multiprocessors NEMETH, JASON 19 September 2008 (has links) No description available. Engineering Technology CMP multiprocessor location cache cache microprocessor low power chip multiprocessor
190	Algorithms Designs and Implementations for Page Allocation in SSD Firmware and SSD Caching in Storage Systems Liang, Shuang January 2010 (has links) No description available. Computer Science flash memory disk cache cache algorithms page allocation algorithm firmware

Search results