Spelling suggestions: "subject:"throttling"" "subject:"bottling""
1 |
A Simple Throttling Concept for Multithreaded Application ServersStridh, Fredrik January 2009 (has links)
Multithreading is today a very common technology to achieve concurrency within software. Today there exists three commonly used threading strategies for multithreaded application servers. These are thread per client, thread per request and thread pool. Earlier studies has shown that the choice of threading strategy is not that important. Our measurements show that the choice of threading architecture becomes more important when the application comes under high load. We will in this study present a throttling concept which can give thread per client almost as good qualities as the thread pool strategy when it comes to performance. No architecture change is required. This concept has been evaluated on three types of hardware, ranging from 1 to 64 CPUs, using 6 alternatives loads and both in C and Java. We have also identified that there is a high correlation between average response times and the length of the run time queue. This can be used to construct a self tuning throttling algorithm that makes the introduction of the throttle concept even simpler, since it does require any configuring.
|
2 |
Modeling and Runtime Systems for Coordinated Power-Performance ManagementLi, Bo 28 January 2019 (has links)
Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method.
|
3 |
Scalable and Energy Efficient Execution Methods for Multicore SystemsLi, Dong 16 February 2011 (has links)
Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power.
In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%).
To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%).
As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation. / Ph. D.
|
4 |
Improving the Efficiency of Parallel Applications on Multithreaded and Multicore SystemsCurtis-Maury, Matthew 15 April 2008 (has links)
The scalability of parallel applications executing on multithreaded and multicore multiprocessors is often quite limited due to large degrees of contention over shared resources on these systems. In fact, negative scalability frequently occurs such that a non-negligable performance loss is observed through the use of more processors and cores. In this dissertation, we present a prediction model for identifying efficient operating points of concurrency in multithreaded scientific applications in terms of both performance as a primary objective and power secondarily. We also present a runtime system that uses live analysis of hardware event rates through the prediction model to optimize applications dynamically. We discuss a dynamic, phase-aware performance prediction model (DPAPP), which combines statistical learning techniques, including multivariate linear regression and artificial neural networks, with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. We find that the scalability model achieves accuracy approaching 95%, sufficiently accurate to identify improved concurrency levels and thread placements from within real parallel scientific applications.
Using DPAPP, we develop a prediction-driven runtime optimization scheme, called ACTOR, which throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each parallel execution phase in an application. ACTOR successfully identifies and exploits program phases where limited scalability results in a performance loss through the use of more processing elements, providing simultaneous reductions in execution time by 5%-18% and power consumption by 0%-11% across a variety of parallel applications and architectures. Further, we extend DPAPP and ACTOR to include support for runtime adaptation of DVFS, allowing for the synergistic exploitation of concurrency throttling and DVFS from within a single, autonomically-acting library, providing improved energy-efficiency compared to either approach in isolation. / Ph. D.
|
5 |
Prediction Models for Multi-dimensional Power-Performance Optimization on Many CoresShah, Ankur Savailal 28 May 2008 (has links)
Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance prediction framework, which we deploy to address the problem of simultaneous runtime optimization of DVFS, DCT, and thread placement on multi-core systems. We present results from an implementation of the prediction framework in a runtime system linked to the Intel OpenMP runtime environment and running on a real dual-processor quad-core system as well as a dual-processor dual-core system. We show that the prediction framework derives near-optimal settings of the three power-aware program adaptation knobs that we consider. Our overall runtime optimization framework achieves significant reductions in energy (12.27% mean) and ED² (29.6% mean), through simultaneous power savings (3.9% mean) and performance improvements (10.3% mean). Our prediction and adaptation framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS.
Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings. / Master of Science
|
6 |
DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTSFerrer Pérez, Joan Lluís 19 December 2012 (has links)
El crecimiento de los computadores paralelos basados en redes de altas prestaciones ha aumentado el interés y esfuerzo de la comunidad investigadora en desarrollar nuevas técnicas que permitan obtener el mejor rendimiento de estas redes. En particular, el desarrollo de nuevas técnicas que permitan un encaminamiento eficiente y que reduzcan la latencia de los paquetes, aumentando así la productividad de la red. Sin embargo, una alta tasa de utilización de la red podría conllevar el que se conoce como "congestión de red", el cual puede causar una degradación del rendimiento.
El control de la congestión en redes multietapa es un problema importante que no está completamente resuelto. Con el fin de evitar la degradación del rendimiento de la red cuando aparece congestión, se han propuesto diferentes mecanismos para el control de la congestión. Muchos de estos mecanismos están basados en notificación explícita de la congestión. Para este propósito, los switches detectan congestión y dependiendo de la estrategia aplicada, los paquetes son marcados con la finalidad de advertir a los nodos origenes. Como respuesta, los nodos origenes aplican acciones correctivas para ajustar su tasa de inyección de paquetes.
El propósito de esta tesis es analizar las diferentes estratégias de detección y corrección de la congestión en redes multietapa, y proponer nuevos mecanismos de control de la congestión encaminados a este tipo de redes sin descarte de paquetes. Las nuevas propuestas están basadas en una estrategia más refinada de marcaje de paquetes en combinación con un conjunto de acciones correctivas justas que harán al mecanismo capaz de controlar la congestión de manera efectiva con independencia del grado de congestión y de las condiciones de tráfico. / Ferrer Pérez, JL. (2012). DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18197
|
7 |
RETHROTTLE : Execution Throttling In The REDEFINE SoC ArchitectureSatrawala, Amar Nath 06 1900 (has links)
REDEFINE is a reconfigurable SoC architecture that provides a unique platform for high performance and low power computing by exploiting the synergistic interaction between coarse grain dynamic dataflow model of computation (to expose abundant parallelism in the applications) and runtime composition of efficient compute structures (on the reconfigurable computation resources). Computer architectures based on the dynamic dataflow model of computation have to be an infinite resource implementation to be able to exploit all available parallelism in all applications. It is not feasible for any real architectural implementation. When limited resource implementations are considered, there is a possibility of loss of performance (inability to efficiently exploit available parallelism). In this thesis, we study the throttling of execution in the REDEFINE architecture to maximize the architecture efficiency. We have formulated it as a design space exploration problem at two levels i.e. architectural configurations and throttling schemes.
Reduced feature/high level simulation or feature specific analytical approaches are very useful for the selective study/exploration of early in design phase architectures/systems. Our approach is similar to that of SEASAME Framework which is used for the study of MPSoC (Multiprocessor SoC) architectures. We have used abstraction (feature reduction) at the levels of architecture and model of computation to make the problem approachable and practically feasible. A feature specific fast hybrid (mixed level) simulation framework for the early in design phase study is developed and implemented for the huge design space exploration (1284 throttling schemes, 128 architectural configurations and 10 applications i.e. 1.6 million executions).
We have done performance modeling in terms of selection of important performance criteria, ranking of the explored throttling schemes and investigation of the effectiveness of the design space exploration using statistical hypothesis testing. We found some interesting obvious/intuitive and some non-obvious/counterintuitive results. The two performance criteria namely Exec.T and Avg.TU were found sufficient to represent the performance and the resource usage characteristics of the architecture independent of the throttling schemes, the architectural configurations and the applications. The ranking of the throttling schemes based on the selected performance criteria is found to be statistically very significant. The intuitive throttling schemes span the range of performance from the best to the worst. We found absence of trade-off amongst all of the performance criteria. The best throttling schemes give appreciable overall performance (25%) and resource usage (37%) gains in the throttling unit simultaneously. The design space exploration of the throttling schemes is found to be fine and uniform.
|
8 |
Design And Construction Of An Educational Pump Bench With Operational ControlsGuner, Berkay 01 December 2005 (has links) (PDF)
System characteristics of automated pumping systems may change due to wear,
aging of piping, and accumulation of deposits in the system and/or due to
configuration changes. Such changes might result in conflicts between the controlling
algorithms and the actual system requirements for each particular case. The said
mismatch between the actual physical system and the software controlling it, may
result in inefficient operation of the pump which may even lead to total system
failures (overpressurization of instrumentation and sensing elements etc.) due to
temporary malfunctioning of the system components or permanent damages incurred
by them during operating under unsuitable conditions.
It is intended in this study to design and construct an experimental automated pump
bench with operational components (mechanical, electronical and instrumentation
etc.), serving in a system introducing multiple geometric heads and its controlling and
monitoring software in order to visualize effects of the above-mentioned cases for
education and training purposes.
System characteristics data acquisition module (system test module) provides the
means of recognizing new pump and system characteristics, provided that they were
changed due to some reason (throttled valve, changed pump speed, changed
flowrate or elevation of discharge etc.). Then the pump operation module enables
users to make comparative judgments by observing the effects of the abovementioned
changes.
Above-mentioned testing sequence and monitoring of changing physical quantities
were achieved by employing four pressure transducers, a custom made DC motor
operated -throttling valve with position feedback which was designed and constructed
specifically for this study and a variable frequency drive (VFD) which were all
connected to a custom made Main Control Circuit (MCC) Board.
|
9 |
Inter-Core Interference Mitigation in a Mixed Criticality SystemHinton, Michael Glenn 04 August 2020 (has links)
In this thesis, we evaluate how well isolation can be achieved between two virtual machines within a mixed criticality system on a multi-core processor. We achieve this isolation with Jailhouse, an open-source, minimalist hypervisor. We then enhance Jailhouse with core throttling, a technique we use to minimize inter-core interference between VMs. Then, we run workloads with and without core throttling to determine the effect throttling has on interference between a non-real time VM and a real-time VM. We find that Jailhouse provides excellent isolation between VMs even without throttling, and that core throttling suppresses the remaining inter-core interference to a large extent.
|
10 |
Symbiotic Audio Communication on Interactive TransportOlaleye, Olufunke I. 01 May 2007 (has links)
No description available.
|
Page generated in 0.0627 seconds