151 |
Empirical Studies of Performance Bugs and Performance Analysis Approaches for Software SystemsZAMAN, SHAHED 30 April 2012 (has links)
Developing high quality software is of eminent importance to keep the existing customers satisfied and to remain competitive. One of the most important software quality characteristics is performance, which defines how fast and/or efficiently a software can perform its operation.
While several studies have shown that field problems are often due to performance issues instead of feature bugs, prior research typically treats all bugs as similar when studying various aspects of software quality (e.g., predicting the time to fix a bug) or focused on other types of bug (e.g., security bugs). There is little work that studies performance bugs.
In this thesis, we perform an empirical study to quantitatively and qualitatively examine performance bugs in the Mozilla Firefox and Google Chrome web browser projects in order to find out if performance bugs are really different from other bugs in practice and to understand the rationale behind those differences.
In our quantitative study, we find that performance bugs of the Firefox project take longer time to fix, are fixed by more experienced developers, and require changes to more lines of code. We also study performance bugs relative to security bugs, since security bugs have been extensively studied separately in the past. We find that security bugs are re-opened and tossed more often, are fixed and triaged faster, are fixed by more experienced developers, and are assigned more number of developers in the Firefox project. Google Chrome project also shows different quantitative characteristics between performance and non-performance bugs and from the Firefox project.
Based on our quantitative results, we look at that data from a qualitative point of view. As one of our most interesting observation, we find that end-users are often frustrated with performance problems and often threaten to switch to competing software products.
To better understand, the rationale for some users being very frustrated (even threatening to switch product) even though most systems are well tested, we performed an additional study. In this final study, we explore a global perspective vs a user centric perspective of analyzing performance data. We find that a user-centric perspective might lead to a small number of users with considerably poor performance while the global perspective might show good or same performance across releases.
The results of our studies show that performance bugs are different and should be studied separately in large scale software systems to improve the quality assurance processes related to software performance. / Thesis (Master, Computing) -- Queen's University, 2012-04-30 01:28:22.623
|
152 |
Be My Guest: Nation branding and national representation in the Eurovision Song ContestMeijer, Albert January 2013 (has links)
Since its inception in 1956, the EurovisionSong Contest has been a stage for national representation and an opportunityfor countries to brand themselves. The 2012 Eurovision Song Contest in Baku,Azerbaijan is a prime example of nation branding, both for the host country aswell as the participating countries. Hosting the event gives a country the opportunity to present a specific nation brand, but there are other opportunities for those countries which only have a three-minute time-frame for their performance in presenting a national image. These performances are themain subject of this thesis, which main question is: How do nation-states use the Eurovision Song Contest as a means of nation branding? To answer this question, I use three sub-questions. First, I focus on the concept of identity: how does musical performance represent national and European identity in the context of the Eurovision Song Contest? Secondly, I study the translation of national identity into an image that should appeal to all of Europe, by creating a specific nation brand: how do nations use nation branding through culture as a tool to build an appealing image within the context of Eurovision? Lastly, I study the performance of these nation brands in specific cases during the 2012 Eurovision Song Contest: how is a nation-branded image performed in the Eurovision Song Contest? The first two chapters of my thesis consist of an analysis of literature on identity and nation branding in combination with national representation in Eurovision. My third and last chapter consists of performance analyses of 2012 participants, focusing on performances from Romania, Russia, Ukraine and Montenegro, which in 2012 were some of the richest performances in terms of symbolism concerningnational representation.
|
153 |
The fast multipole method at exascaleChandramowlishwaran, Aparna 13 January 2014 (has links)
This thesis presents a top to bottom analysis on designing and implementing fast algorithms for current and future systems. We present new analysis, algorithmic techniques, and implementations of the Fast Multipole Method (FMM) for solving N- body problems. We target the FMM because it is broadly applicable to a variety of scientific particle simulations used to study electromagnetic, fluid, and gravitational phenomena, among others. Importantly, the FMM has asymptotically optimal time complexity with guaranteed approximation accuracy. As such, it is among the most attractive solutions for scalable particle simulation on future extreme scale systems.
We specifically address two key challenges. The first challenge is how to engineer fast code for today’s platforms. We present the first in-depth study of multicore op- timizations and tuning for FMM, along with a systematic approach for transforming a conventionally-parallelized FMM into a highly-tuned one. We introduce novel opti- mizations that significantly improve the within-node scalability of the FMM, thereby enabling high-performance in the face of multicore and manycore systems. The second challenge is how to understand scalability on future systems. We present a new algorithmic complexity analysis of the FMM that considers both intra- and inter- node communication costs. Using these models, we present results for choosing the optimal algorithmic tuning parameter. This analysis also yields the surprising prediction that although the FMM is largely compute-bound today, and therefore highly scalable on current systems, the trajectory of processor architecture designs, if there are no significant changes could cause it to become communication-bound as early as the year 2015. This prediction suggests the utility of our analysis approach, which directly relates algorithmic and architectural characteristics, for enabling a new kind of highlevel algorithm-architecture co-design.
To demonstrate the scientific significance of FMM, we present two applications
namely, direct simulation of blood which is a multi-scale multi-physics problem and large-scale biomolecular electrostatics. MoBo (Moving Boundaries) is the infrastruc- ture for the direct numerical simulation of blood. It comprises of two key algorithmic components of which FMM is one. We were able to simulate blood flow using Stoke- sian dynamics on 200,000 cores of Jaguar, a peta-flop system and achieve a sustained performance of 0.7 Petaflop/s. The second application we propose as future work in this thesis is biomolecular electrostatics where we solve for the electrical potential using the boundary-integral formulation discretized with boundary element methods (BEM). The computational kernel in solving the large linear system is dense matrix vector multiply which we propose can be calculated using our scalable FMM. We propose to begin with the two dielectric problem where the electrostatic field is cal- culated using two continuum dielectric medium, the solvent and the molecule. This is only a first step to solving biologically challenging problems which have more than two dielectric medium, ion-exclusion layers, and solvent filled cavities.
Finally, given the difficulty in producing high-performance scalable code, productivity is a key concern. Recently, numerical algorithms are being redesigned to take advantage of the architectural features of emerging multicore processors. These new classes of algorithms express fine-grained asynchronous parallelism and hence reduce the cost of synchronization. We performed the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art mul- ticore systems. Our implementations in CnC was able to match and in some cases even exceed competing vendor-tuned and domain specific library codes. We combine these two distinct research efforts by expressing FMM in CnC, our approach tries to marry performance with productivity that will be critical on future systems. Looking forward, we would like to extend this to distributed memory machines, specifically implement FMM in the new distributed CnC, distCnC to express fine-grained paral- lelism which would require significant effort in alternative models.
|
154 |
Performance analysis of greedy subchannel allocation schemes for adaptive OFDMA systemsLv, Nan 23 August 2010 (has links)
Subchannel allocation schemes for OFDMA (orthogonal frequency division multiple access) systems with adaptive transmission are investigated. The analysis involves two scenarios: single-hop transmission and dual-hop relaying transmission. The overall objective is to utilize minimum system resource (in terms of subchannel) while satisfying scheduled users' data rate requirements under a certain error rate performance restriction. The ordered subchannel selection with adaptive modulation scheme is applied to both cases. In single-hop case, a low-complexity greedy subchannel allocation algorithm with two options is proposed, depending on whether a common modulation mode is applied on all selected subchannels or not. The resulting two options are termed as GS-FA (greedy selection with full adaptivity) and GS-LA (greedy selection with limited adaptivity), respectively. In dual-hop relaying case, two relaying schemes are considered: Amplified-and-Forward (AF) and Decode-and-Forward (DF). Based on this, four different combinations of relaying schemes and ordered subchannel selection with adaptive modulation scheme are investigated: DF with full adaptivity (DF-FA), AF with full adaptivity (AF-FA), AF with limited adaptivity (AF-LA), and AF with limited adaptivity and subchannel mapping (AF-LA with SCM). For both single-hop and dual-hop relaying case, performances of different proposed schemes are evaluated through mathematical analysis and compared with selected numerical results.
|
155 |
Decomposition of general queueing network models : an investigation into the implementation of hierarchical decomposition schemes of general closed queueing network models using the principle of minimum relative entropy subject to fully decomposable constraintsTomaras, Panagiotis J. January 1989 (has links)
Decomposition methods based on the hierarchical partitioning of the state space of queueing network models offer powerful evaluation tools for the performance analysis of computer systems and communication networks. These methods being conventionally implemented capture the exact solution of separable queueing network models but their credibility differs when applied to general queueing networks. This thesis provides a universal information theoretic framework for the implementation of hierarchical decomposition schemes, based on the principle of minimum relative entropy given fully decomposable subset and aggregate utilization, mean queue length and flow-balance constraints. This principle is used, in conjuction with asymptotic connections to infinite capacity queues, to derive new closed form approximations for the conditional and marginal state probabilities of general queueing network models. The minimum relative entropy solutions are implemented iteratively at each decomposition level involving the generalized exponential (GE) distributional model in approximating the general service and asymptotic flow processes in the network. It is shown that the minimum relative entropy joint state probability, subject to mean queue length and flow-balance constraints, is identical to the exact product-form solution obtained as if the network was separable. An investigation into the effect of different couplings of the resource units on the relative accuracy of the approximation is carried out, based on an extensive experimentation. The credibility of the method is demonstrated with some illustrative examples involving first-come-first-served general queueing networks with single and multiple servers and favourable comparisons against exact solutions and other approximations are made.
|
156 |
General queueing network models for computer system performance analysis : a maximum entropy method of analysis and aggregation of general queueing network models with application to computer systemsEl-Affendi, Mohamed Ahmed January 1983 (has links)
In this study the maximum entropy formalism [JAYN 57] is suggested as an alternative theory for general queueing systems of computer performance analysis. The motivation is to overcome some of the problems arising in this field and to extend the scope of the results derived in the context of Markovian queueing theory. For the M/G/l model a unique maximum entropy solution., satisfying locALl balance is derived independent of any assumptions about the service time distribution. However, it is shown that this solution is identical to the steady state solution of the underlying Marko-v process when the service time distribution is of the generalised exponential (CE) type. (The GE-type distribution is a mixture of an exponential term and a unit impulse function at the origin). For the G/M/1 the maximum entropy solution is identical in form to that of the underlying Markov process, but a GE-type distribution still produces the maximum overall similar distributions. For the GIG11 model there are three main achievements: first, the spectral methods are extended to give exaft formulae for the average number of customers in the system for any G/G/l with rational Laplace transform. Previously, these results are obtainable only through simulation and approximation methods. (ii) secondly, a maximum entropy model is developed and used to obtain unique solutions for some types of the G/G/l. It is also discussed how these solutions can be related to the corresponding stochastic processes. (iii) the importance of the G/GE/l and the GE/GE/l for the analysis of general networks is discussed and some flow processes for these systems are characterised. For general queueing networks it is shown that the maximum entropy solution is a product of the maximum entropy solutions of the individual nodes. Accordingly, existing computational algorithms are extended to cover general networks with FCFS disciplines. Some implementations are suggested and a flow algorithm is derived. Finally, these results are iised to improve existing aggregation methods. In addition, the study includes a number of examples, comparisons, surveys, useful comments and conclusions.
|
157 |
Documenting the making processFreeman, John January 2001 (has links)
This thesis documents the construction of a performance project, At Last Sight, which was made with a group of undergraduates at University College Chester. The leader of that project is the same person as the writer of this thesis; this locates the act of writing as something embedded within the process of performance making. The writing forms an address to the unreliability of objective observational analysis. It does so through a resistance to those attempts at impartiality and detachment that might usually be expected in an academic investigation. In this case partiality and involvement are more than central to the investigative process, they form the very structure of enquiry. The body of this work was written at the same time as At Last Sight was being constructed, and the ideas encountered herein possess many of the rhythms of performance making. Space is both somewhere performance is made and an integral aspect of the made work. In a similar way the following chapters amount to more than the site where work has been recorded. In tracing the footprints that led to At Last Sight the thesis reveals itself as an element of that which is being traced. Where At Last Sight revealed the performers as the to-be-watched and also as the watchers, the study functions as the to-be-read and also as the reading. In this way the documentation becomes the documented. This notion of integration between the subject and its study runs through the thesis. Approaching performance analysis as something `other' creates a gap between it and its subject that can deny the best attempts to bring the two together. Approached in a less compartmentalised way the analysis is allowed to form an indivisible correspondence with the analysed. When the division between the act and the analysis is dissolved the documentation is able to exist as both fixed object and time-based event. Something of the fluidity of process is acknowledged and articulated in each of the sections presented.
|
158 |
Performance Analysis Of Reliable Multicast ProtocolsCelik, Coskun 01 December 2004 (has links) (PDF)
IP multicasting is a method for transmitting the same information to multiple receivers over IP networks. Reliability issue of multicasting contains the challenges for detection and recovery of packet losses and ordered delivery of the entire data. In this work, existing reliable multicast protocols are classified into three main groups, namely tree based, NACK-only and router assisted, and a representative protocol for each group is selected to demonstrate the advantages and disadvantages of the corresponding approaches. The selected protocols are SRM, PGM and RMTP. Performance characteristics of these protocols are empirically evaluated by using simulation results. Network Simulator-2 (ns2), a discrete event simulator is used for the implementation and simulation of the selected protocols. The contributions of the thesis are twofold, i.e. the extension of the ns library with an open source implementation of RMTP which did not exist earlier and the evaluation of the selected protocols by investigating performance metrics like distribution delay and recovery latency with respect to varying multicast group size, network diameter, link loss rate, etc.
|
159 |
Comparison and End-to-End Performance Analysis of Parallel FilesystemsKluge, Michael 20 September 2011 (has links) (PDF)
This thesis presents a contribution to the field of performance analysis for Input/Output (I/O) related problems, focusing on the area of High Performance Computing (HPC).
Beside the compute nodes, High Performance Computing systems need a large amount of supporting components that add their individual behavior to the overall performance characteristic of the whole system. Especially file systems in such environments have their own infrastructure. File operations are typically initiated at the compute nodes and proceed through a deep software stack until the file content arrives at the physical medium. There is a handful of shortcomings that characterize the current state of the art for performance analyses in this area. This includes a system wide data collection, a comprehensive analysis approach for all collected data, an adjusted trace event analysis for I/O related problems, and methods to compare current with archived performance data.
This thesis proposes to instrument all soft- and hardware layers to enhance the performance analysis for file operations. The additional information can be used to investigate performance characteristics of parallel file systems. To perform I/O analyses on HPC systems, a comprehensive approach is needed to gather related performance events, examine the collected data and, if necessary, to replay relevant parts on different systems. One larger part of this thesis is dedicated to algorithms that reduce the amount of information that are found in trace files to the level that is needed for an I/O analysis. This reduction is based on the assumption that for this type of analysis all I/O events, but only a subset of all synchronization events of a parallel program trace have to be considered. To extract an I/O pattern from an event trace, only these synchronization points are needed that describe dependencies among different I/O requests. Two algorithms are developed to remove negligible events from the event trace.
Considering the related work for the analysis of a parallel file systems, the inclusion of counter data from external sources, e.g. the infrastructure of a parallel file system, has been identified as a major milestone towards a holistic analysis approach. This infrastructure contains a large amount of valuable information that are essential to describe performance effects observed in applications. This thesis presents an approach to collect and subsequently process and store the data. Certain ways how to correctly merge the collected values with application traces are discussed. Here, a revised definition of the term "performance counter" is the first step followed by a tree based approach to combine raw values into secondary values. A visualization approach for I/O patterns closes another gap in the analysis process.
Replaying I/O related performance events or event patterns can be done by a flexible I/O benchmark. The constraints for the development of such a benchmark are identified as well as the overall architecture for a prototype implementation.
Finally, different examples demonstrate the usage of the developed methods and show their potential. All examples are real use cases and are situated on the HRSK research complex and the 100GBit Testbed at TU Dresden. The I/O related parts of a Bioinformatics and a CFD application have been analyzed in depth and enhancements for both are proposed. An instance of a Lustre file system was deployed and tuned on the 100GBit Testbed by the extensive use of external performance counters.
|
160 |
A programming model and performance model for cycle stealingSumitomo, Jiro January 2006 (has links)
This work describes a programming model and performance model for cycle stealing on the Internet. Cycle stealing is the use of otherwise idle computers to perform work, and promises high performance computing at relatively low cost. The Internet, being the largest pool of potentially idle computers, is an obvious target for cycle stealing. However, computers connected to the Internet are often protected by firewalls, preventing point-to-point communication between them. The fluctuating avail-ability of computers for cycle stealing as they move in and out of an idle state, combined with the restricted communication of the Internet environment, means that programming models and abstractions suitable for programming supercom-puters and clusters are not ideal. Therefore, I have created a programming model for cycle stealing which reflects the types of parallel applications that are suitable for execution using idle computers connected to the Internet. The model is de-signed for use by non-expert parallel programmers, and I will show how it simpli-fies the development of cycle stealing applications, enabling rapid application de-velopment, and straightforward porting of existing sequential applications. This simple to use programming model, combined with the low cost of cycle stealing, improves the accessibility of high performance computing to non-traditional us-ers of supercomputers and clusters. Deployment on the Internet, and the need to navigate through firewalls, suggests a web based framework using common web protocols, web servers and web browsers. Part of this work investigates the feasibility of web based approaches to cycle stealing, from the setup of a cycle stealing system, application development and deployment, and connection of potentially idle computers. I designed and implemented a cycle stealing framework, deployable on the web, to meet expec-tations of performance, reliability, ease of use and safety. Existing cycle stealing frameworks emphasise the need for applications to be de-composed into a set of jobs that execute for a long period, that is, a job should have a computation time sufficient to justify its communication cost. However, there are no tools available for users to determine what an appropriate computa-tion time might be, given a job's data communication requirements. To date, de-ciding the granularity of jobs has been a matter of intuition. Therefore, a user may experience uncertainty as to the benefit of cycle stealing for their particular application, especially if the applications will have relatively short-lived jobs. Based on performance analysis of my framework, I have developed an analytical model and simulator, which can be used to predict, and help to optimise, the per-formance of user applications, and show the feasibility of executing a particular application using the cycle stealing framework.
|
Page generated in 0.067 seconds