Global ETD Search

1	Knowledge support for parallel performance data mining Huck, Kevin A., 1972- 03 1900 (has links) xvi, 231 p. : ill. A print copy of this thesis is available through the UO Libraries. Search the library catalog for the location and call number. / Parallel applications running on high-end computer systems manifest a complex combination of performance phenomena, such as communication patterns, work distributions, and computational inefficiencies. Current performance tools compute results that help to describe performance behavior, as well as to understand performance problems and how they came about. Unfortunately, parallel performance tool research has been limited in its contributions to large-scale performance data management and analysis, automated performance investigation, and knowledge-based performance problem reasoning. This dissertation discusses the design of a performance analysis methodology and framework which integrates scalable data management, dimension reduction, clustering, classification and correlation analysis of individual trials of large dimensions, and comparative analysis between multiple application executions. Analysis process workflows can be captured, automating what would otherwise be time-consuming and possibly error prone tasks. More importantly, process automation provides an extensible interface to the analysis process. The methods also integrate context metadata and a rule-based system in order to capture expert performance analysis knowledge about known anomalous behavior patterns. Applying this knowledge to performance analysis results and associated metadata provides a mechanism for diagnosing the causes of performance problems, rather than just summarizing results. Our prototype implementation of our data mining framework, PerfExplorer, and our data management framework, PerfDMF, are applied in large-scale performance studies to demonstrate each thesis contribution. The dissertation concludes with a discussion of future research directions. / Adviser: Allen D. Malony Parallel performance Data mining Dimension reduction Clustering Computer science
2	Evaluations of the parallel extensions in .NET 4.0 Islam, Md. Rashedul, Islam, Md. Rofiqul, Mazumder, Tahidul Arafhin January 2011 (has links) Parallel programming or making parallel application is a great challenging part of computing research. The main goal of parallel programming research is to improve performance of computer applications. A well-structured parallel application can achieve better performance in terms of execution speed over sequential execution on existing and upcoming parallel computer architecture. This thesis named "Evaluations of the parallel extensions in .NET 4.0" describes the experimental evaluation of different parallel application performance with thread-safe data structure and parallel constructions in .NET Framework 4.0. Described different performance issues of this thesis help to make efficient parallel application for better performance. Before describing the experimental evaluation, this thesis describes some methodologies relevant to parallel programming like Parallel computer architecture, Memory architectures, Parallel programming models, decomposition, threading etc. It describes the different APIs in .NET Framework 4.0 and the way of coding for making an efficient parallel application in different situations. It also presents some implementations of different parallel constructs or APIs like Static Multithreading, Using ThreadPool, Task, Parallel.For, Parallel.ForEach, PLINQ etc. The evaluation of parallel application has been done by experimental result evaluation and performance measurements. In most of the cases, the result evaluation shows better performance of parallelism like less execution time and increase CPU uses over traditional sequential execution. In addition parallel loop doesn’t show better performance in case of improper partitioning, oversubscription, improper workloads etc. The discussion about proper partitioning, oversubscription and proper work load balancing will help to make more efficient parallel application. / Program: Magisterutbildning i informatik parallel application decomposition multithreading parallel frameworks data parallelism task parallelism parallel performance Engineering and Technology Teknik och teknologier
3	A Study of Improving the Parallel Performance of VASP. Baker, Matthew Brandon 13 August 2010 (has links) (PDF) This thesis involves a case study in the use of parallelism to improve the performance of an application for computational research on molecules. The application, VASP, was migrated from a machine with 4 nodes and 16 single-threaded processors to a machine with 60 nodes and 120 dual-threaded processors. When initially migrated, VASP's performance deteriorated after about 17 processing elements (PEs), due to network contention. Subsequent modifications that restrict communication amongst VASP processes, together with additional support for threading, allowed VASP to scale up to 112 PEs, the maximum number that was tested. Other performance-enhancing optimizations that were attempted included replacing old libraries, which produced improvements of about 10%, and prefetching, which degraded, rather than enhanced, VASP performance. cluster parallel performance openmp mpi HPC VASP Computer Sciences Physical Sciences and Mathematics Systems Architecture
4	Modeling and Runtime Systems for Coordinated Power-Performance Management Li, Bo 28 January 2019 (has links) Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method. / Ph. D. / System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method. Parallel Performance Modeling Dynamic Voltage and Frequency Scaling Dynamic Memory Throttling Dynamic Concurrency Throttling Shared-Memory Systems
5	Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations Dimitrov, Rossen Petkov 12 May 2001 (has links) This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered. performance metrics cluster computing message-passing middleware parallel processing parallel performance MPI early binding
6	Data Transformation Trajectories in Embedded Systems Kasinathan, Gokulnath January 2016 (has links) Mobile phone tracking is the ascertaining of the position or location of a mobile phone when moving from one place to another place. Location Based Services Solutions include Mobile positioning system that can be used for a wide array of consumer-demand services like search, mapping, navigation, road transport traffic management and emergency-call positioning. The Mobile Positioning System (MPS) supports complementary positioning methods for 2G, 3G and 4G/LTE (Long Term Evolution) networks. Mobile phone is popularly known as an UE (User Equipment) in LTE. A prototype method of live trajectory estimation for massive UE in LTE network has been proposed in this thesis work. RSRP (Reference Signal Received Power) values and TA(Timing Advance) values are part of LTE events for UE. These specific LTE events can be streamed to a system from eNodeB of LTE in real time by activating measurements on UEs in the network. AoA (Angle of Arrival) and TA values are used to estimate the UE position. AoA calculation is performed using RSRP values. The calculated UE positions are filtered using Particle Filter(PF) to estimate trajectory. To obtain live trajectory estimation for massive UEs, the LTE event streamer is modelled to produce several task units with events data for massive UEs. The task level modelled data structures are scheduled across Arm Cortex A15 based MPcore, with multiple threads. Finally, with massive UE live trajectory estimation, IMSI (International mobile subscriber identity) is used to maintain hidden markov requirements of particle filter functionality while maintaining load balance for 4 Arm A15 cores. This is proved by serial and parallel performance engineering. Future work is proposed for Decentralized task level scheduling with hash function for IMSI with extension of cores and Concentric circles method for AoA accuracy. / Mobiltelefoners positionering är välfungerande för positionslokalisering av mobiltelefoner när de rör sig från en plats till en annan. Lokaliseringstjänsterna inkluderar mobil positionering system som kan användas till en mängd olika kundbehovs tjänster som sökning av position, position i kartor, navigering, vägtransporters trafik managering och nödsituationssamtal med positionering. Mobil positions system (MPS) stödjer komplementär positions metoder för 2G, 3G och 4G/LTE (Long Term Evolution) nätverk. Mobiltelefoner är populärt känd som UE (User Equipment) inom LTE. En prototypmetod med verkliga rörelsers estimering för massiv UE i LTE nätverk har blivit föreslagen för detta examens arbete. RSRP (Reference Signal Received Power) värden och TA (Timing Advance) värden är del av LTE händelser för UE. Dessa specifika LTE event kan strömmas till ett system från eNodeB del av LTE, i realtid genom aktivering av mätningar på UEar i nätverk. AoA (Angel of Arrival) och TA värden är använt för att beräkna UEs position. AoA beräkningar är genomförda genom användandet av RSRP värden. Den kalkylerade UE positionen är filtrerad genom användande av Particle Filter (PF) för att estimera rörelsen. För att identifiera verkliga rörelser, beräkningar för massiva UEs, LTE event streamer är modulerad att producera flera uppgifts enheter med event data från massiva UEar. De tasks modulerade data strukturerna är planerade över Arm Cortex A15 baserade MPcore, med multipla trådar. Slutligen, med massiva UE verkliga rörelser, beräkningar med IMSI(International mobile subscriber identity) är använt av den Hidden Markov kraven i Particle Filter’s funktionalitet medans kravet att underhålla last balansen för 4 Arm A15 kärnor. Detta är utfört genom seriell och parallell prestanda teknik. Framtida arbeten för decentraliserade task nivå skedulering med hash funktion för IMSI med utökning av kärnor och Concentric circles metod för AoA noggrannhet. Angle of Arrival Hidden Markov Model Particle Filter Arm A15 MPcore Parallel Programming Real time task level scheduler Angle of Arrival Hidden Markov Model Particle Filter Arm A15 MPcore Parallel Programming Real time task level scheduler Computer and Information Sciences Data- och informationsvetenskap
7	Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases Schöne, Robert, Tschüter, Ronny, Ilsche, Thomas, Schuchart, Joseph, Hackenberg, Daniel, Nagel, Wolfgang E. 18 October 2017 (has links) (PDF) Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes. Score-P Performance Metriken Performance Monitoring Software-Erweiterungen Software-Schnittstellen Energieeffizienz-Tuning Hochleistungsrechnen HPC Score-P parallel performance tools performance metrics performance monitoring plugins interfaces energy efficiency tuning high performance computing HPC ddc:004 rvk:ST 151 rvk:ST 233
8	Extending the Functionality of Score-P through Plugins: Interfaces and Use Cases Schöne, Robert, Tschüter, Ronny, Ilsche, Thomas, Schuchart, Joseph, Hackenberg, Daniel, Nagel, Wolfgang E. 18 October 2017 (has links) Performance measurement and runtime tuning tools are both vital in the HPC software ecosystem and use similar techniques: the analyzed application is interrupted at specific events and information on the current system state is gathered to be either recorded or used for tuning. One of the established performance measurement tools is Score-P. It supports numerous HPC platforms and parallel programming paradigms. To extend Score-P with support for different back-ends, create a common framework for measurement and tuning of HPC applications, and to enable the re-use of common software components such as implemented instrumentation techniques, this paper makes the following contributions: (I) We describe the Score-P metric plugin interface, which enables programmers to augment the event stream with metric data from supplementary data sources that are otherwise not accessible for Score-P. (II) We introduce the flexible Score-P substrate plugin interface that can be used for custom processing of the event stream according to the specific requirements of either measurement, analysis, or runtime tuning tasks. (III) We provide examples for both interfaces that extend Score-P’s functionality for monitoring and tuning purposes. info:eu-repo/classification/ddc/004 ddc:004

Search results