Global ETD Search

581	Optimal Placement of Video Caching Routers for Minimization of Retransmission Delay Shakya, Rosish 08 August 2011 (has links) No description available. Communication Computer Engineering Electrical Engineering Multimedia networking video caching router optimal cache placement dynamic programming solution
582	Analysing Memory Performance when computing DFTs using FFTW / Analys av minneshantering vid beräkning av DFTs med FFTW Heiskanen, Andreas, Johansson, Erik January 2018 (has links) Discrete Fourier Transforms (DFTs) are used in a wide variety of dif-ferent scientific areas. In addition, there is an ever increasing demand on fast and effective ways of computing DFT problems with large data sets. The FFTW library is one of the most common used libraries when computing DFTs. It adapts to the system architecture and predicts the most effective way of solving the input problem. Previous studies have proved the FFTW library to be superior to other DFT solving libraries. However, not many have specifically examined the cache memory performance, which is a key factor for overall performance. In this study, we examined the cache memory utilization when computing 1-D complex DFTs using the FFTW library. Testing was done using bench FFT, Linux Perf and testing scripts. The results from this study show that cache miss ratio increases with problem size when the input size is smaller than the theoretical input size matching the cache capacity. This is also verified by the results from the L2 prefetcher miss ratio. However, the study show that cache miss ratio stabilizes when exceeding the cache capacity. In conclusion, it is possible to use bench FFT and Linux Perf to measure cache memory utilization. Also, the analysis shows that cache memory performance is good when computing 1-D complex DFTS using the FFTW library, since the miss ratios stabilizes at low values. However, we suggest further examination ofthe memory behaviour for DFT computations using FFTW with larger input sizes and a more in-depth testing method. / Diskret Fouriertransform (DFT) används inom många olika vetenskapliga områden. Det finns en ökande efterfrågan på snabba och effektiva sätt att beräkna DFT-problem med stora mängder data. FFTW-biblioteket är ett av de mest använda biblioteken vid beräkning av DFT-problem. FFTW-biblioteket anpassar sig till systemarkitekturen och försöker generera det mest effektiva sättet att lösa ett givet DFT-problem. Tidigare studier har visat att FFTW-biblioteket är effektivare än andra bibliotek som kan användas för att lösa DFT-problem. Däremot har studierna inte fokuserat på minneshanteringen, vilket är en nyckelfaktor för den generella prestandan. I den här studien undersökte vi FFTW-bibliotekets cache-minneshanteringen vid beräkning av 1-D komplexa DFT-problem. Tester utfördes med hjälp av bench FFT, Linux Perf och testskript. Resultaten från denna studie visar att cache-missförhållandet ökar med problemstorleken när problemstorleken ärmindre än den teoretiska problemstorleken som matchar cachekapaciteten. Detta bekräftas av resultat från L2-prefetcher-missförhållandet. Studien visar samtidigt att cache-missförhållandet stabiliseras när problemstorleken överskrider cachekapaciteten. Sammanfattningsvis går det att argumentera för att det är möjligt att använda bench FFT och Linux Perf för att mäta cache-minneshanteringen. Analysen visar också att cache-minneshanteringen är bra vid beräkning av 1-D komplexa DFTs med hjälp av FFTW-biblioteket eftersom missförhållandena stabiliseras vid låga värden. Vi föreslår dock ytterligare undersökning av minnesbeteendet för DFT-beräkningar med hjälp av FFTW där problemstorlekarna är större och en mer genomgående testmetod används. fftw discrete fourier transform fast fourier transform hpc perf cache memory analysis Computer Sciences Datavetenskap (datalogi)
583	Efficient Search for Cost-Performance Optimal Caches Lima-Engelmann, Tobias January 2024 (has links) CPU cache hierarchies are the central solution in bridging the memory wall. A proper understanding of how to trade-off their high cost against performance can lead to cost-savings without sacrificing performance.Due to the combinatorial nature of the problem, there exist a large number of configurations to investigate, making design space exploration slow and cumbersome. To improve this process, this Thesis develops and evaluates a model for optimally trading-off cost and performance of CPU cache hierarchies, named the Optimal Cache Problem (OCP), in the form of a Non-linear Integer Problem. A second goal of this work is the development of an efficient solver for the OCP, which was found to be a branch & bound algorithm and proven to function correctly. Experiments were conducted to empirically analyse and validate the model and to showcase possible use-cases. There, it was possible to ascribe the model outputs on measurable performance metrics. The model succeeded in formalising the inherent trade-off between cost and performance in a way that allows for an efficient and complete search of the configuration space of possible cache hierarchies. In future work, the model needs to be refined and extended to allow for the simultaneous analysis of multiple programs. CPU Cache Hierarchies Design Space Exploration Branch and Bound Method Computer Sciences Datavetenskap (datalogi)
584	Evaluation of cache memory configurations with performance monitoring in embedded real-time automotive systems : Determining performance characteristics of cache memory with hardware counters and software profiling. / Utvärdning av cacheminnekonfigurationer med prestandamätning i realtidsstyrda fordonssystem : Bestämning av prestandaegenskaper i cacheminnen med hårdvaruräknare och mjukvaruprofilering Westman, Andreas January 2022 (has links) Modern day automotive systems are highly dependent on real-time software control to manage the powertrain and high-level features, such as cruise control. The computational power available has increased tremendously from decades of microcontroller and hardware development on such platforms. In contrast, the access times to the memory are still substantial, creating a significant bottleneck in the system. Therefore, small cache memories are used to reduce access times and improve performance. With significantly smaller but faster memory, the configuration and behaviour of the cache play an important role and are also highly dependent on the platform. Several of the configurations have an impact on the platform behaviour not only in terms of execution time, but also in multithreaded coherency, robustness, security, and internal bus usage. To distinguish performance differences and cache behaviour between configurations, hardware counters and low-level processor events such as bus usage, line fills, reads, and writes are monitored in conjunction with task load profiling. This proves to be an effective measurement method for use in a real-time embedded automotive system to provide both average and worstcase scenarios. In addition, the collected results are used to suggest improvements to the configuration of the platform used for measurements. For example, no major performance benefits were measured from excluding certain parts of the memory to increase hit rate. Less robust write-policies copy-back proved to be more efficient and could be used in combination with error correction to increase security. Memory coherency in multithreaded execution also proved to be inefficient and a major source to increased miss-rate due to snooping. / Moderna fordonssystem är idag mycket beroende av realtidsmjukvara för att effektivt kontrollera både drivlina och med användarfunktioner som till exempel farthållare. Beräkningskraften tillgänglig på de mikrokontroller som används har ökat kraftigt från årtionden av utveckling. Åtkomsttiden mellan processorn och minnet är däremot fortfarande stor och skapar en stor flaskhals i systemet. För att minska åtkomsttiden används cacheminnen med mycket hög prestanda och begränsad minnesmängd. Med väsentligt mindre och snabbare cacheminnen krävs optimerade konfigurationer för att utnyttja minnet effektivt, vilket kan vara svårt då användningen och prestandan är varierande för olika system. Fler cachekonfigurationer påverkar systemet i mer än bara exekveringstid utan och i minnessynkronisering, tillförlitlighet, säkerhet och intern bussanvändning. För att särskilja olika prestandaegenskaper mellan olika konfigurationer används hårdvaruräknare och processorhändelser som bussanvändning, radändringar, läsningar och skrivningar i kombination med profilering av processoranvändning. Det visar sig vara en effektiv metod för att utvärdera olika scenarion som bästa-, sämsta-, och medelfall i realtidssystem i fordon. Utöver det, används resultaten för att föreslå nya konfigurationsförbättringar på plattformen som användes. Några exempel på detta är hur försök till att förbättra minnesträffar i cacheminnet genom att exkludera vissa typer av minnessektioner inte gav någon prestandaförbättring. Mindre tillförlitliga skrivmetoder som copy-back visade sig vara mer effektiva och kunde användas i kombination med feldetektering för att förbättra säkerheten. cache memory real-time hardware counters automotive cacheminne realtid hårdvaruräkare fordon Computer and Information Sciences Data- och informationsvetenskap
585	An Energy Efficient Data Cache Implementing 2-way LRC Architecture Musalappa, Saibhushan 09 December 2006 (has links) Conventional level one data caches are widely used in high-performance microprocessors. Shrinking process parameters in chip fabrication technology allow a much larger number of devices on a chip with every new generation. This reduction in device size has led to an increase in the magnitude of a type of energy dissipation hitherto ignored?leakage energy. Transistor level leakage energy research for sub-micron processes has shown that leakage can be as much as or greater than the dynamic energy for advanced circuit designs. Researchers have devised techniques to reduce leakage energy at the fabrication and circuit levels. Transitioning the idle circuits from operating voltage to a reduced voltage is one such circuit-level technique. The ELRU-SEQ replacement policy exploits this technique to control cache bank transitions. This thesis proposes a new cache architecture called 2-way Leakage Reduction Cache (LRC) that uses this replacement policy. The architecture employs xor-mapping function to reduce conflict misses. leakage energy skew mapping lru-seq replacement data cache transition circuit
586	Compiler Techniques for Transformation Verification, Energy Efficiency and Cache Modeling Bao, Wenlei 13 September 2018 (has links) No description available. Computer Science compiler optimization polyhedral compilation program verification energy optimization DVFS cache modeling vulnerability analysis
587	B+ TREE CACHE MEMORY PERFORMANCE GIESKE, EDMUND J. 06 October 2004 (has links) No description available. B+Tree Cache Memory Conscious Data Structure File System Index Database Table
588	A SRAM-Based Log Buffer Flash Translation Layer for Solid State Disk using Fully-Associative Sector Translation Heng, Li 17 April 2009 (has links) No description available. Engineering Log block LOG BUFFER FLASH block and data LOG cache block
589	A Novel Cache Migration Scheme in Network-on-Chip Devices Nafziger, Jonathan W. 06 December 2010 (has links) No description available. Electrical Engineering Network-on-chip NoC Cache Data Migration Spatial Locality Temporal Locality
590	A Context-Aware Approach to Android Memory Management Muthu, Srinivas 14 November 2016 (has links) No description available. Computer Science Context Awareness Android Android Process Cache Android Memory Management Context Aware Operating System

Search results