• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 1
  • Tagged with
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Precise Analysis of Private And Shared Caches for Tight WCET Estimates

Nagar, Kartik January 2016 (has links) (PDF)
Worst Case Execution Time (WCET) is an important metric for programs running on real-time systems, and finding precise estimates of a program’s WCET is crucial to avoid over-allocation and wastage of hardware resources and to improve the schedulability of task sets. Hardware Caches have a major impact on a program’s execution time, and accurate estimation of a program’s cache behavior generally leads to significant reduction of its estimated WCET. However, the cache behavior of an access cannot be determined in isolation, since it depends on the access history, and in multi-path programs, the sequence of accesses made to the cache is not fixed. Hence, the same access can exhibit different cache behavior in different execution instances. This issue is further exacerbated in shared caches in a multi-core architecture, where interfering accesses from co-running programs on other cores can arrive at any time and modify the cache state. Further, cache analysis aimed towards WCET estimation should be provably safe, in that the estimated WCET should always exceed the actual execution time across all execution instances. Faced with such contradicting requirements, previous approaches to cache analysis try to find memory accesses in a program which are guaranteed to hit the cache, irrespective of the program input, or the interferences from other co-running programs in case of a shared cache. To do so, they find the worst-case cache behavior for every individual memory access, analyzing the program (and interferences to a shared cache) to find whether there are execution instances where an access can super a cache miss. However, this approach loses out in making more precise predictions of private cache behavior which can be safely used for WCET estimation, and is significantly imprecise for shared cache analysis, where it is often impossible to guarantee that an access always hits the cache. In this work, we take a fundamentally different approach to cache analysis, by (1) trying to find worst-case behavior of groups of cache accesses, and (2) trying to find the exact cache behavior in the worst-case program execution instance, which is the execution instance with the maximum execution time. For shared caches, we propose the Worst Case Interference Placement (WCIP) technique, which finds the worst-case timing of interfering accesses that would cause the maximum number of cache misses on the worst case execution path of the program. We first use Integer Linear Programming (ILP) to find an exact solution to the WCIP problem. However, this approach does not scale well for large programs, and so we investigate the WCIP problem in detail and prove that it is NP-Hard. In the process, we discover that the source of hardness of the WCIP problem lies in finding the worst case execution path which would exhibit the maximum execution time in the presence of interferences. We use this observation to propose an approximate algorithm for performing WCIP, which bypasses the hard problem of finding the worst case execution path by simply assuming that all cache accesses made by the program occur on a single path. This allows us to use a simple greedy algorithm to distribute the interfering accesses by choosing those cache accesses which could be most affected by interferences. The greedy algorithm also guarantees that the increase in WCET due to interferences is linear in the number of interferences. Experimentally, we show that WCIP provides substantial precision improvement in the final WCET over previous approaches to shared cache analysis, and the approximate algorithm almost matches the precision of the ILP-based approach, while being considerably faster. For private caches, we discover multiple scenarios where hit-miss predictions made by traditional Abstract Interpretation-based approaches are not sufficient to fully capture cache behavior for WCET estimation. We introduce the concept of cache miss paths, which are abstractions of program path along which an access can super a cache miss. We propose an ILP-based approach which uses cache miss paths to find the exact cache behavior in the worst-case execution instance of the program. However, the ILP-based approach needs information about the worst-case execution path to predict the cache behavior, and hence it is difficult to integrate it with other micro-architectural analysis. We then show that most of the precision improvement of the ILP-based approach can be recovered without any knowledge of the worst-case execution path, by a careful analysis of the cache miss paths themselves. In particular, we can use cache miss paths to find the worst-case behavior of groups of cache accesses. Further, we can find upper bounds on the maximum number of times that cache accesses inside loops can exhibit worst-case behavior. This results in a scalable, precise method for performing private cache analysis which can be easily integrated with other micro-architectural analysis.
2

A memory profiler for 3D graphics application using ninary instrumentation

Deo, Mrinal 25 July 2011 (has links)
This report describes the architecture and implementation of a memory profiler for 3D graphics applications. The memory profiling is done for parts of the program which runs on the graphics processor and is responsible for rendering the image. The shaders are parsed and every memory instruction is instrumented with additional instruction for profiling. The results are then transferred from the video memory to CPU memory. Profiling is done for a frame and completes in less than three minutes. The report also describes various analyses that can be done using the results obtained from this profiler. The report discusses the design of an analytical cache model that can be used to identify candidate memory buffers suitable for caching among all the buffers used by an application. The profiler can segregate results for reads and writes separately, can handle all formats of texture access instructions and predicated instructions. / text
3

New Techniques for Building Timing-Predictable Embedded Systems

Guan, Nan January 2013 (has links)
Embedded systems are becoming ubiquitous in our daily life. Due to close interaction with physical world, embedded systems are typically subject to timing constraints. At design time, it must be ensured that the run-time behaviors of such systems satisfy the pre-specified timing constraints under any circumstance. In this thesis, we develop techniques to address the timing analysis problems brought by the increasing complexity of underlying hardware and software on different levels of abstraction in embedded systems design. On the program level, we develop quantitative analysis techniques to predict the cache hit/miss behaviors for tight WCET estimation, and study two commonly used replacement policies, MRU and FIFO, which cannot be analyzed adequately using the state-of-the-art qualitative cache analysis method. Our quantitative approach greatly improves the precision of WCET estimation and discloses interesting predictability properties of these replacement policies, which are concealed in the qualitative analysis framework. On the component level, we address the challenges raised by multi-core computing. Several fundamental problems in multiprocessor scheduling are investigated. In global scheduling, we propose an analysis method to rule out a great part of impossible system behaviors for better analysis precision, and establish conditions to guarantee the bounded responsiveness of computing tasks. In partitioned scheduling, we close a long standing open problem to generalize the famous Liu and Layland's utilization bound in uniprocessor real-time scheduling to multiprocessor systems. We also propose to use cache partitioning for multi-core systems to avoid contentions on shared caches, and solve the underlying schedulability analysis problem. On the system level, we present techniques to improve the Real-Time Calculus (RTC) analysis framework in both efficiency and precision. First, we have developed Finitary Real-Time Calculus to solve the scalability problem of the original RTC due to period explosion. The key idea is to only maintain and operate on a limited prefix of each curve that is relevant to the final results during the whole analysis procedure. We further improve the analysis precision of EDF components in RTC, by precisely bounding the response time of each computation request.
4

Cache Prediction and Execution Time Analysis on Real-Time MPSoC

Neikter, Carl-Fredrik January 2008 (has links)
<p>Real-time systems do not only require that the logical operations are correct. Equally important is that the specified time constraints always are complied. This has successfully been studied before for mono-processor systems. However, as the hardware in the systems gets more complex, the previous approaches become invalidated. For example, multi-processor systems-on-chip (MPSoC) get more and more common every day, and together with a shared memory, the bus access time is unpredictable in nature. This has recently been resolved, but a safe and not too pessimistic cache analysis approach for MPSoC has not been investigated before. This thesis has resulted in designed and implemented algorithms for cache analysis on real-time MPSoC with a shared communication infrastructure. An additional advantage is that the algorithms include improvements compared to previous approaches for mono-processor systems. The verification of these algorithms has been performed with the help of data flow analysis theory. Furthermore, it is not known how different types of cache miss characteristic of a task influence the worst case execution time on MPSoC. Therefore, a program that generates randomized tasks, according to different parameters, has been constructed. The parameters can, for example, influence the complexity of the control flow graph and average distance between the cache misses.</p>
5

Cache Prediction and Execution Time Analysis on Real-Time MPSoC

Neikter, Carl-Fredrik January 2008 (has links)
Real-time systems do not only require that the logical operations are correct. Equally important is that the specified time constraints always are complied. This has successfully been studied before for mono-processor systems. However, as the hardware in the systems gets more complex, the previous approaches become invalidated. For example, multi-processor systems-on-chip (MPSoC) get more and more common every day, and together with a shared memory, the bus access time is unpredictable in nature. This has recently been resolved, but a safe and not too pessimistic cache analysis approach for MPSoC has not been investigated before. This thesis has resulted in designed and implemented algorithms for cache analysis on real-time MPSoC with a shared communication infrastructure. An additional advantage is that the algorithms include improvements compared to previous approaches for mono-processor systems. The verification of these algorithms has been performed with the help of data flow analysis theory. Furthermore, it is not known how different types of cache miss characteristic of a task influence the worst case execution time on MPSoC. Therefore, a program that generates randomized tasks, according to different parameters, has been constructed. The parameters can, for example, influence the complexity of the control flow graph and average distance between the cache misses.

Page generated in 0.0614 seconds