Global ETD Search

51	Modeling and Runtime Systems for Coordinated Power-Performance Management Li, Bo 28 January 2019 (has links) Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method. Parallel Performance Modeling Dynamic Voltage and Frequency Scaling Dynamic Memory Throttling Dynamic Concurrency Throttling Shared-Memory Systems
52	Enhancing storage performance in virtualized environments: a pro-active approach Sivathanu, Sankaran 17 May 2011 (has links) Efficient storage and retrieval of data is critical in today's computing environments and storage systems need to keep up with the pace of evolution of other system components like CPU, memory etc., for building an overall efficient system. With virtualization becoming pervasive in enterprise and cloud-based infrastructures, it becomes vital to build I/O systems that better account for the changes in scenario in virtualized systems. However, the evolution of storage systems have been limited significantly due to adherence to legacy interface standards between the operating system and storage subsystem. Even though storage systems have become more powerful in the recent times hosting large processors and memory, thin interface to file system leads to wastage of vital information contained in the storage system from being used by higher layers. Virtualization compounds this problem with addition of new indirection layers that makes underlying storage systems even more opaque to the operating system. This dissertation addresses the problem of inefficient use of disk information by identifying storage-level opportunities and developing pro-active techniques to storage management. We present a new class of storage systems called pro-active storage systems (PaSS), which in addition to being compatible with existing I/O interface, exerts a limit degree of control over the file system policies by leveraging it's internal information. In this dissertation, we present our PaSS framework that includes two new I/O interfaces called push and pull, both in the context of traditional systems and virtualized systems. We demonstrate the usefulness of our PaSS framework by a series of case studies that exploit the information available in underlying storage system layer, for overall improvement in IO performance. We also built a framework to evaluate performance and energy of modern storage systems by implementing a novel I/O trace replay tool and an analytical model for measuring performance and energy of complex storage systems. We believe that our PaSS framework and the suite of evaluation tools helps in better understanding of modern storage system behavior and thereby implement efficient policies in the higher layers for better performance, data reliability and energy efficiency by making use of the new interfaces in our framework. Trace replay Pro-active storage IO virtualization Energy performance modeling Storage performance Virtualization Information retrieval
53	Schémas d'adaptations algorithmiques sur les nouveaux supports d'éxécution parallèles / Algorithmic adaptations schemas on the new parallel platforms Achour, Sami 06 July 2013 (has links) Avec la multitude des plates-formes parallèles émergentes caractérisées par une hétérogénéité sur le plan matériel (processeurs, réseaux, …), le développement d'applications et de bibliothèques parallèles performantes est devenu un défi. Une méthode qui se révèle appropriée pour relever ce défi est l'approche adaptative consistant à utiliser plusieurs paramètres (architecturaux, algorithmiques,…) dans l'objectif d'optimiser l'exécution de l'application sur la plate-forme considérée. Les applications adoptant cette approche doivent tirer avantage des méthodes de modélisation de performance pour effectuer leurs choix entre les différentes alternatives dont elles disposent (algorithmes, implémentations ou ordonnancement). L'usage de ces méthodes de modélisation dans les applications adaptatives doit obéir aux contraintes imposées par ce contexte, à savoir la rapidité et la précision des prédictions. Nous proposons dans ce travail, en premier lieu, un framework de développement d'applications parallèles adaptatives basé sur la modélisation théorique de performances. Ensuite, nous nous concentrons sur la tâche de prédiction de performance pour le cas des milieux parallèles et hiérarchiques. En effet, nous proposons un framework combinant les différentes méthodes de modélisation de performance (analytique, expérimentale et simulation) afin de garantir un compromis entre les contraintes suscités. Ce framework profite du moment d'installation de l'application parallèle pour découvrir la plate-forme d'exécution et les traces de l'application afin de modéliser le comportement des parties de calcul et de communication. Pour la modélisation de ces deux composantes, nous avons développé plusieurs méthodes s'articulant sur des expérimentations et sur la régression polynômiale pour fournir des modèles précis. Les modèles résultats de la phase d'installation seront utilisés (au moment de l'exécution) par notre outil de prédiction de performance de programmes MPI (MPI-PERF-SIM) pour prédire le comportement de ces derniers. La validation de ce dernier framework est effectuée séparément pour les différents modules, puis globalement pour le noyau du produit de matrices. / With the multitude of emerging parallel platforms characterized by their heterogeneity in terms of hardware components (processors, networks, ...), the development of performant applications and parallel libraries have become a challenge. A method proved suitable to face this challenge is the adaptive approach which uses several parameters (architectural, algorithmic, ...) in order to optimize the execution of the application on the target platform. Applications adopting this approach must take advantage of performance modeling methods to make their choice between the alternatives they have (algorithms, implementations or scheduling). The use of these modeling approaches in adaptive applications must obey the constraints imposed by the context, namely predictions speed and accuracy. We propose in this work, first, a framework for developing adaptive parallel applications based on theoretical modeling performance. Then, we focuse on the task of performance prediction for the case of parallel and hierarchical environments. Indeed, we propose a framework combining different methods of performance modeling (analytical, experimental and simulation) to ensure a balance between the constraints raised. This framework makes use of the installing phase of the application to discover the parallel platform and the execution traces of this application in order to model the behavior of two components namely computing kernels and pt/pt communications. For the modeling of these components, we have developed several methods based on experiments and polynomial regression to provide accurate models. The resulted models will be used at runtime by our tool for performance prediction of MPI programs (MPI-PERF-SIM) to predict the behavior of the latter. The validation of the latter framework is done separately for the different modules, then globally on the matrix product kernel. Adaptativité Mpi Plates-formes hiérarchiques Régression Simulation Adaptive Hierarchical clusters Mpi Performance modeling and prediction Regression Simulation 004
54	Improving Resource Management in Virtualized Data Centers using Application Performance Models Kundu, Sajib 01 April 2013 (has links) The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center. Virtualization Resource Management Data Centers Machine Learning Techniques Performance Modeling VM Sizing Computer Sciences OS and Networks Systems Architecture
55	Modèles de performance pour l'adaptation des méthodes numériques aux architectures multi-coeurs vectorielles. Application aux schémas Lagrange-Projection en hydrodynamique compressible / Improving numerical methods on recent multi-core processors. Application to Lagrange-Plus-Remap hydrodynamics solver. Gasc, Thibault 06 December 2016 (has links) Ces travaux se concentrent sur la résolution de problèmes de mécanique des fluides compressibles. De nombreuses méthodes numériques ont depuis plusieurs décennies été développées pour traiter ce type de problèmes. Cependant, l'évolution et la complexité des architectures informatiques nous poussent à actualiser et repenser ces méthodes numériques afin d'utiliser efficacement les calculateurs massivement parallèles. Au moyen de modèles de performance, nous analysons une méthode numérique de référence de type Lagrange-Projection afin de comprendre son comportement sur les supercalculateurs récents et d'en optimiser l'implémentation pour ces architectures. Grâce au bilan de cet analyse, nous proposons une formulation alternative de la phase de projection ainsi qu'une nouvelle méthode numérique plus performante baptisée Lagrange-Flux. Les développements de cette méthode ont permis d'obtenir des résultats d'une précision comparable à la méthode de référence. / This works are dedicated to hydrodynamics. For decades, numerous numerical methods has been developed to deal with this type of problems. However, both the evolution and the complexity of computing make us rethink or redesign our numerical solver in order to use efficiently massively parallel computers. Using performance modeling, we perform an analysis of a reference Lagrange-Remap solver in order to deeply understand its behavior on current supercomputer and to optimize its implementation. Thanks to the conclusions of this analysis, we derive a new numerical solver which by design has a better performance. We call it the Lagrange-Flux solver. The accuracy obtained with this solver is similar to the reference one. The derivation of this method also leads to rethink the Remap step. Hydrodynamique Lagrange-Projection Modèles de performance Informatique haute performance Hydrodynamics Lagrange-Plus-Remap Performance modeling High performance computing
56	Microstructure Changes In Solid Oxide Fuel Cell Anodes After Operation, Observed Using Three-Dimensional Reconstruction And Microchemical Analysis Parikh, Harshil R. 09 February 2015 (has links) No description available. Materials Science
57	A Second Generation Generic Systems Simulator (GENESYS) for a Gigascale System-on-a-Chip (SoC). Nugent, Steven Paul 14 April 2005 (has links) Future opportunities for gigascale integration will be governed by a hierarchy of theoretical and practical limits that can be codified as follows: fundamental, material, device, circuit, and system. An exponential increase in on-chip integration is driving System-on-Chip (SoC) methodologies as a dominant design solution for gigascale ICs. Therefore, a second generation generic systems simulator (GENESYS) is developed to address a need for rapid assessment of technology/architecture tradeoffs for multi-billion transistor SoCs while maintaining the depth of core modeling codified in the hierarchy of limits. A newly developed system methodology incorporates a hiearchical block-based model, a dual interconnect distribution for both local and global interconnects, a generic on-chip bus model, and cell placement algorithms. A comparison of simulation results for five commercial SoC implementations shows increased accuracy in predicting die size, clock frequency, and total power dissipation. ITRS projections for future technology requirments are applied with results indicating that increasing static power dissipation is a key impediment to making continued improvements in chip performance. Additionally, simulations of a generic chip multi-processor architecture utilizing several interconnect schemes shows that the most promising candidate for the future of on-chip global interconnect networks will be hierarchical bus structures providing a high degree of connectivity while maintaining high operating frequencies. Chip modeling Performance modeling Simulator Gigascale Chip modeling GENESYS System design Computer simulation
58	Performance Modeling Based Scheduling And Rescheduling Of Parallel Applications On Computational Grids Sanjay, H A 10 1900 (has links) As computational grids have become popular and ubiquitous, users have access to large number and different types of geographically distributed grid resources. Many computational grid frameworks are composed of multiple distributed sites with each site consisting of one or more dedicated or non-dedicated clusters. Jobs submitted to a grid are handled by a matascheduler which interacts with the local schedulers of the clusters for scheduling jobs to the individual clusters. Computational grids have been found to be powerful research-beds for execution of various kinds of parallel applications. When a parallel application is submitted to a grid, the metascheduler has to choose a set of resources from a cluster for application execution. To select the best set of resources for application execution, it is important to determine the performance of the application. Accurate performance estimates of an application is essential in assisting a grid meta scheduler to efficiently schedule user jobs. Thus models that predict execution times of parallel applications on a set of resources and a search procedure (scheduling strategy) which selects the best set of machines within a cluster for application execution are of importance for enabling the parallel applications on grids. For efficient execution of large scientific parallel applications consisting of multiple phases, performance models of the individual phases should be obtained. Efficient rescheduling strategies that can use the per-phase models to adapt the parallel applications to application and resource dynamics are necessary for maintaining high performance of the applications on grids. A practical and robust grid computing infrastructure that integrates components related to application and resource monitoring, performance modeling, scheduling and rescheduling techniques, is highly essential for large-scale deployment and high performance of scientific applications on grid systems and hence for fostering high performance computing. This thesis focuses on developing performance models for predicting execution times of parallel problems/subproblems on dedicated and non-dedicated grid resources. The thesis also constructs robust scheduling and rescheduling strategies in a grid metascheduler that can use the performance models for efficient execution of large scientific parallel applications on dynamic grids. Finally, the thesis builds a practical and robust grid middleware infrastructure which integrates components related to performance modeling, scheduling and rescheduling, monitoring and migration frameworks for large-scale deployment and use of high performance applications on grids. The thesis consists of four main components. In the first part of the thesis, we have developed a comprehensive set of performance modeling strategies to predict the execution times of tightly-coupled parallel applications on a set of resources in a dedicated or non-dedicated cluster. The main purpose of our prediction strategies is to aid grid metaschedulers in making scheduling decisions. Our performance modeling strategies, based on linear regression, can deal with non-dedicated systems where the loads can change during application executions. Our models do not require detailed knowledge and instrumentation of the applications and can be constructed without the involvement of application developers. The strategies are intended for rapid and large scale deployment of parallel applications on non-dedicated grid systems. We have evaluated our strategies on 8, 16, 24 and 32-node clusters with random loads and load traces from a grid system. Our performance modeling strategies gave less than 30% average percentage prediction errors in all cases, which is reasonable for non-dedicated systems. We also found that scheduling based on the predictions by our strategies will result in perfect scheduling in many cases. For modeling large-scale scientific applications, we use execution profiles and automatic program analysis, and manual analysis of significant portions of the application’s code to identify the different phases of applications. We then adopt our performance modeling strategies to predict execution times for the different phases of the tightly-coupled parallel applications on a set of resources in a dedicated or non-dedicated cluster. Our experiments show that using combinations of performance models of the phases give 18% – 70% more accurate predictions than using single performance models for the applications. In the second part of the thesis, we have devised, evaluated and compared algorithms for scheduling tightly-coupled parallel applications on multi-cluster grids. Our algorithms use performance models that predict the execution times of parallel applications, for evaluations of candidate schedules. In this work, we propose a novel algorithm called Box Elimination (BE) that searches a space of performance model parameters to determine efficient schedules. By eliminating large search space regions containing poorer solutions at each step and searching high quality solutions, our algorithm is able to generate efficient schedules within few seconds for even clusters of 512 processors. By means of large number of real and simulation experiment, we compared our algorithm with popular optimization techniques. We show that our algorithm generates up to 80% more efficient schedules than other algorithms and the resulting execution times are more robust against performance modeling errors. The third part of the thesis deals with policies for rescheduling long-running multi-phase parallel applications in response to application and resource dynamics. In this work, we use our performance modeling and scheduling strategies to derive rescheduling plans for executing multi-phase parallel applications on grids. A rescheduling plan consists of potential points in application execution for rescheduling and schedules of resources for application execution between two consecutive rescheduling points. We have developed three algorithms, namely an incremental algorithm, a divide-and-conquer algorithm and a genetic algorithm, for deriving a rescheduling plan for a parallel application execution. We have also developed an algorithm that uses rescheduling plans derived on different clusters to form a single coherent rescheduling plan for application execution on a grid consisting of multiple clusters. The rescheduling plans generated by our algorithms are highly efficient leading to application execution times that are higher than the execution times corresponding to brute force method by less than 10%. We also find that rescheduling in response to changing application and resource dynamics, using the rescheduling plans for multi-cluster grids generated by our algorithms, give much lesser execution times when compared to executions of the applications on a single schedule throughout application execution. In the final part of the thesis, we have developed a practical grid middleware framework called MerITA (Middleware for Performance Improvement of Tightly Coupled Parallel Applications on Grids), a system for effective execution of tightly-coupled parallel applications on multi-cluster grids consisting of dedicated or non-dedicated, interactive or batch systems. The framework brings together performance modeling for automatically determining the characteristics of parallel applications, scheduling strategies that use the performance models for efficient mapping of applications to resources, rescheduling policies for determining the points in application execution when executing applications can be rescheduled to different sets of resources to obtain performance improvement and a check-pointing library for enabling rescheduling. Computational Grids Performance Modeling Scheduling Rescheduling Grid Computing Scheduling Algorithms Rescheduling Algorithms Grid Scheduling Grids Tightly-Coupled Parallel Applications Computer Science
59	Building Energy-efficient Edge Systems Tumkur Ramesh Babu, Naveen January 2020 (has links) No description available. Computer Science
60	ALGORITHMS FOR LAYOUT-AWARE AND PERFORMANCE MODEL DRIVEN SYNTHESIS OF ANALOG CIRCUITS AGARWAL, ANURADHA January 2005 (has links) No description available. Analog Radio-frequency Circuit Synthesis Layout Parasitics Performance Modeling Parasitic Estimation and Modeling Layout-Aware Synthesis Circuit sizing Parasitic Corners Yield Optimization Parasitic Capacitances Dynamic Performance Macromodel

Search results