• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 9
  • 2
  • 1
  • Tagged with
  • 68
  • 68
  • 20
  • 13
  • 13
  • 12
  • 10
  • 7
  • 7
  • 7
  • 6
  • 6
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

NUMA Data-Access Bandwidth Characterization and Modeling

Braithwaite, Ryan Karl 29 February 2012 (has links)
Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access performance penalties when accessing non-local data. In many modern NUMA architectures, both memory and I/O controllers are distributed within a node and CPU cores are logically divided into “local” and “remote” data-accesses within the system. In this thesis a method for analyzing main memory and PCIe data-access characteristics of modern AMD and Intel NUMA architectures is presented. Also presented here is the synthesis of data-access performance models designed to quantify the effects of these architectural characteristics on data-access bandwidth. Such performance models provide an analytical tool for determining the performance impact of remote data-accesses for a program or access pattern running in a given system. Data-access performance models also provide a means for comparing the data-access bandwidth and attributes of NUMA architectures, for improving application performance when running on these architectures, and for improving process/thread mapping onto CPU cores in these architectures. Preliminary examples of how programs respond to these data-access bandwidth characteristics are also presented as motivation for future work. / Master of Science
2

Efficient simulation techniques for large-scale applications

Huang, Jen-Cheng 21 September 2015 (has links)
Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as GPGPU kernels. In this dissertation, three simulation techniques were proposed to simulate emerging GPGPU kernels and data analytic workloads efficiently. First, TBPoint reduce the simulated instructions of GPGPU kernels using the inter-launch and intra-launch sampling approaches. Second, GPUmech improves the simulation speed of GPGPU kernels by abstracting the simulation model using functional simulation and analytical modeling. Finally, SimProf applies stratified random sampling with performance counters to select representative simulation points for data analytic workloads to deal with data-dependent performance. This dissertation presents the techniques that can be used to simulate the emerging large-scale workloads accurately and efficiently.
3

An Approximate Analytical Model for the Discharge Performance of a Primary Zinc/Air Cell

White, Leo J 12 January 2005 (has links)
The characteristics of a Zinc/Air (Zn/Air) primary cell are discussed. In addition, current technologies and the corresponding electrical performance are introduced. The basic principles of operation of a Zn/Air primary cell are discussed, focusing on the anode, cathode, and electrolyte. Basic kinetic and transport expressions are developed for the two main components of the cell: the anode and cathode compartments, based on which an overall formula for the cell polarization is developed. Input parameters are selected and approximated where possible to observe the model¡¦s ability to predict potential versus current density. Time-dependent anode performance is accomplished through the use of the shrinking core reaction model for the discharge of the zinc particles. The time-dependent dimensionless radius of the zinc particle (ď) is then used in conjunction with the developed transport and kinetic expressions for the prediction of the overall cell performance as a function of time. Plots of cell voltage prediction versus time and percent capacity versus time are presented. The simulations indicate an adequate approximate analytic model valid for a variety of drain rates corresponding to current hearing instrument devices in the market.
4

Garbage Collection in Software Performance Engineering / Garbage Collection in Software Performance Engineering

Libič, Peter January 2015 (has links)
Title Garbage Collection in Software Performance Engineering Author Peter Libič peter.libic@d3s.mff.cuni.cz Advisor doc. Ing. Petr Tůma, Dr. petr.tuma@d3s.mff.cuni.cz Department Department of Distributed and Dependable Systems Faculty of Mathematics and Physics Charles University Malostranské nám. 25, 118 00 Prague, Czech Republic Abstract The increasing popularity of languages with automatic memory management makes the garbage collector (GC) performance key to effective application execution. Unfortunately, performance behavior of contemporary GC is not well understood by the application developers and often ignored by the per- formance model designers. In this thesis, we (1) evaluate nature of GC overhead with respect to its effect on accuracy of performance models. We assess the possibility to model GC overhead as a black-box and identify workload characteristics that contribute to GC performance. Then we (2) design an analytical model of one-generation collector and a simulation model of both one-generation and two-generation collectors. These models rely on application characteristics. We evaluate the accuracy of such models and perform an analysis of their sensitivity to the inputs. Using the model we expose the gap between under- standing the GC overhead based on knowing the algorithm...
5

Performance modeling and enhancement for IEEE 802.11 DCF

Alkadeki, H. H. Z. January 2015 (has links)
The most important standard in wireless local area networks (WLANs) is IEEE 802.11. For this reason, much of the research work for the enhancement of WLANs is generally based on the behaviour of the IEEE 802.11 standard. This standard is divided into several layers. One of the important layers is the medium access control (MAC) layer. It plays an important role in accessing the transmission medium and data transmission of wireless stations. However, it still presents many challenges related to the performance metrics of quality of service (QoS), such as system throughput and access delay. Modelling and performance analysis of the MAC layer are also extremely important. Thus, the performance modelling and analysis have become very important in the design and enhancement of wireless networks. Therefore, this research work is devoted to evaluate and enhance the performance modelling of IEEE 802.11 MAC-distributed coordination function (DCF), which can lead to the improvement of the performance metrics of QoS. In order to more accurately evaluate the system performance for IEEE 802.11 DCF, a new analytical model to compute a packet transmission probability for IEEE 802.11 DCF has been proposed based on difference probabilities in transmission mechanism. The performance saturated throughput is then evaluated with the proposed analytical model. In addition, a new analytical model for estimating the MAC layer packet delay distribution of IEEE 802.11 DCF is also proposed. The performance results highlight the importance of considering the different probabilities between events in transmission mechanism for an accurate performance evaluation model of IEEE 802.11 DCF in terms of throughput and delay. To enhance the effectiveness of IEEE 802.11 DCF, a new dynamic control backoff time algorithm to enhance both the delay and throughput performances of the IEEE 802.11 DCF is proposed. This algorithm considers the distinction between high and low traffic loads in order to deal with unsaturated traffic load conditions. In particular, the equilibrium point analysis (EPA) model is used to represent the algorithm under various traffic load conditions. Results of extensive simulation experiments illustrate that the proposed algorithm yields better performance throughput and a better average transmission packet delay than related algorithms.
6

Evaluating MapReduce System Performance: A Simulation Approach

Wang, Guanying 13 September 2012 (has links)
Scale of data generated and processed is exploding in the Big Data era. The MapReduce system popularized by open-source Hadoop is a powerful tool for the exploding data problem, and is widely employed in many areas involving large scale of data. In many circumstances, hypothetical MapReduce systems must be evaluated, e.g. to provision a new MapReduce system to provide certain performance goal, to upgrade a currently running system to meet increasing business demands, to evaluate novel network topology, new scheduling algorithms, or resource arrangement schemes. The traditional trial-and-error solution involves the time-consuming and costly process in which a real cluster is first built and then benchmarked. In this dissertation, we propose to simulate MapReduce systems and evaluate hypothetical MapReduce systems using simulation. This simulation approach offers significantly lower turn-around time and lower cost than experiments. Simulation cannot entirely replace experiments, but can be used as a preliminary step to reveal potential flaws and gain critical insights. We studied MapReduce systems in detail and developed a comprehensive performance model for MapReduce, including sub-task phase level performance models for both map and reduce tasks and a model for resource contention between multiple processes running in concurrent. Based on the performance model, we developed a comprehensive simulator for MapReduce, MRPerf. MRPerf is the first full-featured MapReduce simulator. It supports both workload simulation and resource contention, and it still offers the most complete features among all MapReduce simulators to date. Using MRPerf, we conducted two case studies to evaluate scheduling algorithms in MapReduce and shared storage in MapReduce, without building real clusters. Furthermore, in order to further integrate simulation and performance prediction into MapReduce systems and leverage predictions to improve system performance, we developed online prediction framework for MapReduce, which periodically runs simulations within a live Hadoop MapReduce system. The framework can predict task execution within a window in near future. These predictions can be used by other components in MapReduce systems in order to improve performance. Our results show that the framework can achieve high prediction accuracy and incurs negligible overhead. We present two potential use cases, prefetching and dynamic adapting scheduler. / Ph. D.
7

Performance Modeling of Single Processor and Multi-Processor Computer Architectures

Commissariat, Hormazd P. 11 March 2000 (has links)
Determining the optimum computer architecture configuration for a specific application or a generic algorithm is a difficult task. The complexity involved in today's computer architectures and systems makes it more difficult and expensive to easily and economically implement and test full functional prototypes of computer architectures. High level VHDL performance modeling of architectures is an efficient way to rapidly prototype and evaluate computer architectures. Determining the architecture configuration is fixed, one would like to know the tolerance and expected performance of individual/critical components and also what would be the best way to map the software tasks onto the processor(s). Trade-offs and engineering compromises can be analyzed and the effects of certain component failures and communication bottle-necks can be studied. A part of the research work done for the RASSP (Rapid Prototyping of Application Specific Signal Processors) project funded by Department of Defense contracts is documented in this thesis. The architectures modeled include a single-processor, single-global-bus system; a four processor, single-global-bus system; a four processor, multiple-local-bus, single-global-bus system; and finally, a four processor multiple-local-bus system interconnected by a crossbar interconnection switch. The hardware models used are mostly legacy/inherited models from an earlier project and they were upgraded, modified and customized to suit the current research needs and requirements. The software tasks that are run on the processors are pieces of the signal and image processing algorithm run on the Synthetic Aperture Radar (SAR). The communication between components/devices is achieved in the form of tokens which are record structures. The output is a trace file which tracks the passage of the tokens through various components of the architecture. The output trace file is post-processed to obtain activity plots and latency plots for individual components of the architecture. / Master of Science
8

Validation of a Task Network Human Performance Model of Driving

Wojciechowski, Josephine Quinn 24 May 2006 (has links)
Human performance modeling (HPM) is often used to investigate systems during all phases of development. HPM was used to investigate function allocation in crews for future combat vehicles. The tasks required by the operators centered around three primary functions, commanding, gunning, and driving. In initial investigations, the driver appeared to be the crew member with the highest workload. Validation of the driver workload model (DWM) is necessary for confidence in the ability of the model to predict workload. Validation would provide mathematical proof that workload of driving is high and that additional tasks impact the performance. This study consisted of two experiments. The purpose of each experiment was to measure performance and workload while driving and attending to an auditory secondary task. The first experiment was performed with a human performance model. The second experiment replicated the same conditions in a human-in-the-loop driving simulator. The results of the two experiments were then correlated to determine if the model could predict performance and workload changes. The results of the investigation indicate that there is some impact of an auditory task on driving. The model is a good predictor of mental workload changes with auditory secondary tasks. However, predictions of the impact on performance from secondary auditory tasks were not demonstrated in the simulator study. Frequency of the distraction was more influential in the changes of performance and workload than the demand of the distraction, at least under the conditions tested in this study. While the workload numbers correlate with simulator numbers, using the model would require a better understanding of what the workload changes would mean in terms of performance measures. / Master of Science
9

Performance Modeling of Multi-core Systems : Caches and Locks

Pan, Xiaoyue January 2016 (has links)
Performance is an important aspect of computer systems since it directly affects user experience. One way to analyze and predict performance is via performance modeling. In recent years, multi-core systems have made processors more powerful while keeping power consumption relatively low. However the complicated design of these systems makes it difficult to analyze performance. This thesis presents performance modeling techniques for cache performance and synchronization cost on multi-core systems. A cache can be designed in many ways with different configuration parameters including cache size, associativity and replacement policy. Understanding cache performance under different configurations is useful to explore the design choices. We propose a general modeling framework for estimating the cache miss ratio under different cache configurations, based on the reuse distance distribution. On multi-core systems, each core usually has a private cache. Keeping shared data in private caches coherent has an extra cost. We propose three models to estimate this cost, based on information that can be gathered when running the program on a single core. Locks are widely used as a synchronization primitive in multi-threaded programs on multi-core systems. While they are often necessary for protecting shared data, they also introduce lock contention, which causes performance issues. We present a model to predict how much contention a lock has on multi-core systems, based on information obtainable from profiling a run on a single core. If lock contention is shown to be a performance bottleneck, one of the ways to mitigate it is to use another lock implementation. However, it is costly to investigate if adopting another lock implementation would reduce lock contention since it requires reimplementation and measurement. We present a model for forecasting lock contention with another lock implementation without replacing the current lock implementation.
10

Workload characterization, controller design and performance evaluation for cloud capacity autoscaling

Ali-Eldin Hassan, Ahmed January 2015 (has links)
This thesis studies cloud capacity auto-scaling, or how to provision and release re-sources to a service running in the cloud based on its actual demand using an auto-matic controller. As the performance of server systems depends on the system design,the system implementation, and the workloads the system is subjected to, we focuson these aspects with respect to designing auto-scaling algorithms. Towards this goal,we design and implement two auto-scaling algorithms for cloud infrastructures. Thealgorithms predict the future load for an application running in the cloud. We discussthe different approaches to designing an auto-scaler combining reactive and proactivecontrol methods, and to be able to handle long running requests, e.g., tasks runningfor longer than the actuation interval, in a cloud. We compare the performance ofour algorithms with state-of-the-art auto-scalers and evaluate the controllers’ perfor-mance with a set of workloads. As any controller is designed with an assumptionon the operating conditions and system dynamics, the performance of an auto-scalervaries with different workloads.In order to better understand the workload dynamics and evolution, we analyze a6-years long workload trace of the sixth most popular Internet website. In addition,we analyze a workload from one of the largest Video-on-Demand streaming servicesin Sweden. We discuss the popularity of objects served by the two services, the spikesin the two workloads, and the invariants in the workloads. We also introduce, a mea-sure for the disorder in a workload, i.e., the amount of burstiness. The measure isbased on Sample Entropy, an empirical statistic used in biomedical signal processingto characterize biomedical signals. The introduced measure can be used to charac-terize the workloads based on their burstiness profiles. We compare our introducedmeasure with the literature on quantifying burstiness in a server workload, and showthe advantages of our introduced measure.To better understand the tradeoffs between using different auto-scalers with differ-ent workloads, we design a framework to compare auto-scalers and give probabilisticguarantees on the performance in worst-case scenarios. Using different evaluation cri-teria and more than 700 workload traces, we compare six state-of-the-art auto-scalersthat we believe represent the development of the field in the past 8 years. Knowingthat the auto-scalers’ performance depends on the workloads, we design a workloadanalysis and classification tool that assigns a workload to its most suitable elasticitycontroller out of a set of implemented controllers. The tool has two main components;an analyzer, and a classifier. The analyzer analyzes a workload and feeds the analysisresults to the classifier. The classifier assigns a workload to the most suitable elasticitycontroller based on the workload characteristics and a set of predefined business levelobjectives. The tool is evaluated with a set of collected real workloads, and a set ofgenerated synthetic workloads. Our evaluation results shows that the tool can help acloud provider to improve the QoS provided to the customers.

Page generated in 0.0733 seconds