Spelling suggestions: "subject:"chip multiprocessor"" "subject:"chip multioprocessors""
1 |
ENHANCING FAIRNESS AND PERFORMANCE ON CHIP MULTI-PROCESSOR PLATFORMS WITH CONTENTION-AWARE SCHEDULING POLICIESMarinakis, Theodoros 01 December 2019 (has links) (PDF)
Chip Multi-Processor (CMP) platforms, well-established in the server, desktop and embedded domain, succeeded in overcoming the power consumption and heat dissipation bottlenecks by integrating multiple cores, less complex and powerful than their single-core ancestors, in a single die. A major issue induced by the design of the CMPs is contention for the shared resources of the platform, Last Level Cache (LLC) and main memory bandwidth. Applications, running concurrently on the cores, compete with each other for the shared resources, and are subject to performance degradation. The way applications are assigned to the CMP, is crucial for the overall performance of the system. A scheduling policy that accounts for contention will bring high performance speed-ups, whereas an agnostic one will generate unpredictable contention conditions. For this reason the significance of the scheduler has been elevated, as it is the component that determines which applications utilize the resources each time period.In this thesis, we address cross-core interference on CMP platforms, by designing scheduling policies that improve performance and fairness. We deal with contention in three ways. In our first approach, we incorporate the notion of progress in order to balance unfairness among the applications of the workload. Performance degradation is not evenly distributed and progress greatly varies among them. In order to provide a fair execution environment, we monitor, at run-time, applications assigned to the CPU and prioritize them based on the extent at which they are affected by contention.In our second approach, we target performance by mitigating contention on shared resources. It is necessary to decide, out of all the possible application schedules, the one that generates the least amount of resource interference. To achieve that, the first indispensable step is to extract an interference profile for the applications executed on the CMP. We accomplish that by applying pressure to all levels of memory hierarchy and identifying the point at which performance is compromised. From our analysis, we understand that shared resources can tolerate pressure of certain amount; applications can be grouped together if the overall generated pressure does not reach the saturation point of the shared resources. Having extracted this information, we proceed to the placement of the application in such a way that overall resource requirements are as balanced as possible across the execution.Finally, we design a policy in order to improve performance and fairness at the same time. Applications that heavily rely on the LLC are separated from those with high main memory bandwidth, in order to avoid the destructive effects caused by the LLC thrashing behavior of the latter. The group executed on the CPU is determined based on the key observation that the overall requirements of the group should not exceed the saturation limits of the CMP. Additionally, during execution, the progress for each application is estimated and those with the least accumulated progress are prioritized.Our proposed policies are evaluated in an Intel Xeon E5-2620 v3 processor. A variety of benchmark suites were utilized to generate mixes of diverse characteristics. Our methodologies are implemented in user-space and can be deployed on Linux-based systems. Experimental results show the benefits of tackling contention in shared resources. We achieve throughput gains of up to 16% and unfairness is reduced by 2.37x on average compared to Linux scheduler.
|
2 |
Power Constrained Performance Optimization in Chip Multi-processorsMa, Kai 03 September 2013 (has links)
No description available.
|
3 |
Resource Optimized Scheduling For Enhanced Power Efficiency And Throughput On Chip Multi Processor PlatformsKundan, Shivam 01 May 2024 (has links) (PDF)
The parallel nature of process execution on Chip Multi-Processors (CMPs) has boosted levels of application performance far beyond the capabilities of erstwhile single-core designs. Generally, CMPs offer improved performance by integrating multiple simpler cores onto a single die that share certain computing resources among them such as last-level caches, data buses, and main memory. This ensures architectural simplicity while also boosting performance for multi-threaded applications. However, a major trade-off associated with this approach is that concurrently executing applications incur performance degradation if their collective resource requirements exceed the total amount of resources available to the system. If dynamic resource allocation is not carefully considered, the potential performance gain from having multiple cores may be outweighed by the losses due to contention for allocation of shared resources. Additionally, CMPs with inbuilt dynamic voltage-frequency scaling (DVFS) mechanisms may try to compensate for the performance bottleneck by scaling to higher clock frequencies. For performance degradation due to shared-resource contention, this does not necessarily improve performance but does ensure a significant penalty on power consumption due to the quadratic relation of electrical power and voltage (P_dynamic ∝ V^2 * f).This dissertation presents novel methodologies for balancing the competing requirements of high performance, fairness of execution, and enforcement of priority, while also ensuring overall power efficiency of CMPs. Specifically, we (1) Analyze the problem of resource interference during concurrent process execution and propose two fine-grained scheduling methodologies for improving overall performance and fairness, (2) Develop an approach for enforcement of priority (i.e., minimum performance) for specific processes while avoiding resource starvation for others, and (3) Present a machine-learning approach for maximizing the power efficiency (performance-per-Watt) of CMPs through estimation of a workload's performance and power consumption limits at different clock frequencies.As modern computing workloads become increasingly dynamic, and computers themselves become increasingly ubiquitous, the problem of finding the ideal balance between performance and power consumption of CMPs is of particular relevance today, especially given the unprecedented proliferation of embedded devices for use in Internet-of-Things, edge computing, smart wearables, and even exotic experiments such as space probes comprised entirely of a CMP, sensors, and an antenna ("space chips"). Additionally, reducing power consumption while maintaining constant performance can contribute to addressing the growing problem of dark silicon.
|
Page generated in 0.0951 seconds