Global ETD Search

1	Finding the limitations of a real-time updated simulation driven web map / Att upptäcka begränsningarna för en realtidsuppdaterad webb-karta i en simuleringsmiljö Svensson, Fritz January 2022 (has links) In this study, built in performance counters are used to identify limits of a prototype real time map web application running in the Google Chrome browser on a desktop PC. The context in which the application is intended for use is one of distributed simulation. This provides the variable conditions of update frequency and number of (simulated) entities to be displayed on the map. The performance counters log the utilization of memory and CPU on the client machine, and in addition to these a metric called delay time is calculated to see how the time used to handle a certain amount of entities relates to the given time interval, determined by the update frequency. Two specific use cases in the simulation context are defined and tested, as well as baselines of browser idle use, local rendering, and excluded rendering. Furthermore, ranges of the variable conditions are tested with the other condition fixed, to see how different combinations of frequency and entity number affect resource use and delay times. Findings include that the high frequency use case results in a continuously high delay time and is not feasible for use without further optimization. Additionally, the application is unlikely to perform better given less communication since rendering and scripting impact resource use the most. The use of memory is not significantly increased from idle use at any time and CPU use is limited by the single threaded parsing of JavaScript resulting in a highest percentage of non-idle time during a test of 31%. Lastly, the growth of delay time occur later in terms of entity updates per second when the number of entities is relatively low and the update frequency is high, as opposed to a lower frequency and a higher number of entities. Performance Counters simulation real-time map Software Engineering Programvaruteknik
2	Does the Halting Necessary for Hardware Trace Collection Inordinately Perturb the Results? Watson, Myles G. 16 November 2004 (has links) (PDF) Processor address traces are invaluable for characterizing workloads and testing proposed memory hierarchies. Long traces are needed to exercise modern cache designs and produce meaningful results, but are difficult to collect with hardware monitors because microprocessors access memory too frequently for disks or other large storage to keep up. The small, fast buffers of the monitors fill quickly; in order to obtain long contiguous traces, the processor must be stopped while the buffer is emptied. This halting may perturb the traces collected, but this cannot be measured directly, since long uninterrupted traces cannot be collected. We make the case that hardware performance counters, which collect runtime statistics without influencing execution, can be used to measure halting effects. We use the performance counters of the Pentium 4 processor to collect statistics while halting the processor as if traces were being collected. We then compare these results to the statistics obtained from unhalted runs. We present our results in terms of which counters are affected, why, and what this means for trace-collection systems. performance counters processor hardware tracing address trace collection Computer Sciences
3	Generating Miss Rate Curves with Low Overhead Using Existing Hardware Walsh, Tom 17 February 2010 (has links) Miss Rate Curves (MRCs) for main memory have been proposed as a representation of memory utilization for use in a range of optimizations in the area of memory man- agement. Various techniques exist for their creation; however, all real-world methods of MRC generation must make trade-offs between overhead and accuracy. Proposals for new hardware techniques exist, but have yet to be implemented in actual hardware. We pro- pose the use of the Intel PEBS (Precise Event-Based Sampling) performance monitoring capability for the task of MRC generation on existing commodity hardware. We use PEBS to generate MRCs and compare them against MRCs generated through instrumentation, finding the PEBS MRCs to be good, but imperfect approximations, while keeping average PEBS overheads below 5%. We were unable to show that PEBS is better or worse than existing techniques, but believe we have succeeded in showing the promise of the use of general purpose performance monitoring hardware for this task and in motivating future research and development in this area. Miss Rate Curves Hardware Performance Counters Memory Management Operating Systems 0984
4	Generating Miss Rate Curves with Low Overhead Using Existing Hardware Walsh, Tom 17 February 2010 (has links) Miss Rate Curves (MRCs) for main memory have been proposed as a representation of memory utilization for use in a range of optimizations in the area of memory man- agement. Various techniques exist for their creation; however, all real-world methods of MRC generation must make trade-offs between overhead and accuracy. Proposals for new hardware techniques exist, but have yet to be implemented in actual hardware. We pro- pose the use of the Intel PEBS (Precise Event-Based Sampling) performance monitoring capability for the task of MRC generation on existing commodity hardware. We use PEBS to generate MRCs and compare them against MRCs generated through instrumentation, finding the PEBS MRCs to be good, but imperfect approximations, while keeping average PEBS overheads below 5%. We were unable to show that PEBS is better or worse than existing techniques, but believe we have succeeded in showing the promise of the use of general purpose performance monitoring hardware for this task and in motivating future research and development in this area. Miss Rate Curves Hardware Performance Counters Memory Management Operating Systems 0984
5	Contention-Aware Scheduling for SMT Multicore Processors Feliu Pérez, Josué 27 March 2017 (has links) The recent multicore era and the incoming manycore/manythread era generate a lot of challenges for computer scientists going from productive parallel programming, over network congestion avoidance and intelligent power management, to circuit design issues. The ultimate goal is to squeeze out as much performance as possible while limiting power and energy consumption and guaranteeing a reliable execution. The increasing number of hardware contexts of current and future systems makes the scheduler an important component to achieve this goal, as there is often a combinatorial amount of different ways to schedule the distinct threads or applications, each with a different performance due to the inter-application interference. Picking an optimal schedule can result in substantial performance gains. This thesis deals with inter-application interference, covering the problems this fact causes on performance and fairness on actual machines. The study starts with single-threaded multicore processors (Intel Xeon X3320), follows with simultaneous multithreading (SMT) multicores supporting up to two threads per core (Intel Xeon E5645), and goes to the most highly threaded per-core processor that has ever been built (IBM POWER8). The dissertation analyzes the main contention points of each experimental platform and proposes scheduling algorithms that tackle the interference arising at each contention point to improve the system throughput and fairness. First we analyze contention through the memory hierarchy of current multicore processors. The performed studies reveal high performance degradation due to contention on main memory and any shared cache the processors implement. To mitigate such contention, we propose different bandwidth-aware scheduling algorithms with the key idea of balancing the memory accesses through the workload execution time and the cache requests among the different caches at each cache level. The high interference that different applications suffer when running simultaneously on the same SMT core, however, does not only affect performance, but can also compromise system fairness. In this dissertation, we also analyze fairness in current SMT multicores. To improve system fairness, we design progress-aware scheduling algorithms that estimate, at runtime, how the processes progress, which allows to improve system fairness by prioritizing the processes with lower accumulated progress. Finally, this dissertation tackles inter-application contention in the IBM POWER8 system with a symbiotic scheduler that addresses overall SMT interference. The symbiotic scheduler uses an SMT interference model, based on CPI stacks, that estimates the slowdown of any combination of applications if they are scheduled on the same SMT core. The number of possible schedules, however, grows too fast with the number of applications and makes unfeasible to explore all possible combinations. To overcome this issue, the symbiotic scheduler models the scheduling problem as a graph problem, which allows finding the optimal schedule in reasonable time. In summary, this thesis addresses contention in the shared resources of the memory hierarchy and SMT cores of multicore processors. We identify the main contention points of three systems with different architectures and propose scheduling algorithms to tackle contention at these points. The evaluation on the real systems shows the benefits of the proposed algorithms. The symbiotic scheduler improves system throughput by 6.7\% over Linux. Regarding fairness, the proposed progress-aware scheduler reduces Linux unfairness to a third. Besides, since the proposed algorithm are completely software-based, they could be incorporated as scheduling policies in Linux and used in small-scale servers to achieve the mentioned benefits. / La actual era multinúcleo y la futura era manycore/manythread generan grandes retos en el área de la computación incluyendo, entre otros, la programación paralela productiva o la gestión eficiente de la energía. El último objetivo es alcanzar las mayores prestaciones limitando el consumo energético y garantizando una ejecución confiable. El incremento del número de contextos hardware de los sistemas hace que el planificador se convierta en un componente importante para lograr este objetivo debido a que existen múltiples formas diferentes de planificar las aplicaciones, cada una con distintas prestaciones debido a las interferencias que se producen entre las aplicaciones. Seleccionar la planificación óptima puede proporcionar importantes mejoras de prestaciones. Esta tesis se ocupa de las interferencias entre aplicaciones, cubriendo los problemas que causan en las prestaciones y equidad de los sistemas actuales. El estudio empieza con procesadores multinúcleo monohilo (Intel Xeon X3320), sigue con multinúcleos con soporte para la ejecución simultanea (SMT) de dos hilos (Intel Xeon E5645), y llega al procesador que actualmente soporta un mayor número de hilos por núcleo (IBM POWER8). La disertación analiza los principales puntos de contención en cada plataforma y propone algoritmos de planificación que mitigan las interferencias que se generan en cada uno de ellos para mejorar la productividad y equidad de los sistemas. En primer lugar, analizamos la contención a lo largo de la jerarquía de memoria. Los estudios realizados revelan la alta degradación de prestaciones provocada por la contención en memoria principal y en cualquier cache compartida. Para mitigar esta contención, proponemos diversos algoritmos de planificación cuya idea principal es distribuir los accesos a memoria a lo largo del tiempo de ejecución de la carga y las peticiones a las caches entre las diferentes caches compartidas en cada nivel. Las altas interferencias que sufren las aplicaciones que se ejecutan simultáneamente en un núcleo SMT, sin embargo, no solo afectan a las prestaciones, sino que también pueden comprometer la equidad del sistema. En esta tesis, también abordamos la equidad en los actuales multinúcleos SMT. Para mejorarla, diseñamos algoritmos de planificación que estiman el progreso de las aplicaciones en tiempo de ejecución, lo que permite priorizar los procesos con menor progreso acumulado para reducir la inequidad. Finalmente, la tesis se centra en la contención entre aplicaciones en el sistema IBM POWER8 con un planificador simbiótico que aborda la contención en todo el núcleo SMT. El planificador simbiótico utiliza un modelo de interferencia basado en pilas de CPI que predice las prestaciones para la ejecución de cualquier combinación de aplicaciones en un núcleo SMT. El número de posibles planificaciones, no obstante, crece muy rápido y hace inviable explorar todas las posibles combinaciones. Por ello, el problema de planificación se modela como un problema de teoría de grafos, lo que permite obtener la planificación óptima en un tiempo razonable. En resumen, esta tesis aborda la contención en los recursos compartidos en la jerarquía de memoria y el núcleo SMT de los procesadores multinúcleo. Identificamos los principales puntos de contención de tres sistemas con diferentes arquitecturas y proponemos algoritmos de planificación para mitigar esta contención. La evaluación en sistemas reales muestra las mejoras proporcionados por los algoritmos propuestos. Así, el planificador simbiótico mejora la productividad, en promedio, un 6.7% con respecto a Linux. En cuanto a la equidad, el planificador que considera el progreso consigue reducir la inequidad de Linux a una tercera parte. Además, dado que los algoritmos propuestos son completamente software, podrían incorporarse como políticas de planificación en Linux y usarse en servidores a pequeña escala para obtener los benefi / L'actual era multinucli i la futura era manycore/manythread generen grans reptes en l'àrea de la computació incloent, entre d'altres, la programació paral·lela productiva o la gestió eficient de l'energia. L'últim objectiu és assolir les majors prestacions limitant el consum energètic i garantint una execució confiable. L'increment del número de contextos hardware dels sistemes fa que el planificador es convertisca en un component important per assolir aquest objectiu donat que existeixen múltiples formes distintes de planificar les aplicacions, cadascuna amb unes prestacions diferents degut a les interferències que es produeixen entre les aplicacions. Seleccionar la planificació òptima pot donar lloc a millores importants de les prestacions. Aquesta tesi s'ocupa de les interferències entre aplicacions, cobrint els problemes que provoquen en les prestacions i l'equitat dels sistemes actuals. L'estudi comença amb processadors multinucli monofil (Intel Xeon X3320), segueix amb multinuclis amb suport per a l'execució simultània (SMT) de dos fils (Intel Xeon E5645), i arriba al processador que actualment suporta un major nombre de fils per nucli (IBM POWER8). Aquesta dissertació analitza els principals punts de contenció en cada plataforma i proposa algoritmes de planificació que aborden les interferències que es generen en cadascun d'ells per a millorar la productivitat i l'equitat dels sistemes. En primer lloc, estudiem la contenció al llarg de la jerarquia de memòria en els processadors multinucli. Els estudis realitzats revelen l'alta degradació de prestacions provocada per la contenció en memòria principal i en qualsevol cache compartida. Per a mitigar la contenció, proposem diversos algoritmes de planificació amb la idea principal de distribuir els accessos a memòria al llarg del temps d'execució de la càrrega i les peticions a les caches entre les diferents caches compartides en cada nivell. Les altes interferències que sofreixen las aplicacions que s'executen simultàniament en un nucli SMT, no obstant, no sols afecten a las prestacions, sinó que també poden comprometre l'equitat del sistema. En aquesta tesi, també abordem l'equitat en els actuals multinuclis SMT. Per a millorar-la, dissenyem algoritmes de planificació que estimen el progrés de les aplicacions en temps d'execució, el que permet prioritzar els processos amb menor progrés acumulat para a reduir la inequitat. Finalment, la tesi es centra en la contenció entre aplicacions en el sistema IBM POWER8 amb un planificador simbiòtic que aborda la contenció en tot el nucli SMT. El planificador simbiòtic utilitza un model d'interferència basat en piles de CPI que prediu les prestacions per a l'execució de qualsevol combinació d'aplicacions en un nucli SMT. El nombre de possibles planificacions, no obstant, creix molt ràpid i fa inviable explorar totes les possibles combinacions. Per resoldre aquest contratemps, el problema de planificació es modela com un problema de teoria de grafs, la qual cosa permet obtenir la planificació òptima en un temps raonable. En resum, aquesta tesi aborda la contenció en els recursos compartits en la jerarquia de memòria i el nucli SMT dels processadors multinucli. Identifiquem els principals punts de contenció de tres sistemes amb diferents arquitectures i proposem algoritmes de planificació per a mitigar aquesta contenció. L'avaluació en sistemes reals mostra les millores proporcionades pels algoritmes proposats. Així, el planificador simbiòtic millora la productivitat una mitjana del 6.7% respecte a Linux. Pel que fa a l'equitat, el planificador que considera el progrés aconsegueix reduir la inequitat de Linux a una tercera part. A més, donat que els algoritmes proposats son completament software, podrien incorporar-se com a polítiques de planificació en Linux i emprar-se en servidors a petita escala per obtenir els avantatges mencionats. / Feliu Pérez, J. (2017). Contention-Aware Scheduling for SMT Multicore Processors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/79081 / Premios Extraordinarios de tesis doctorales process scheduling bandwidth contention symbiotic scheduling chip multicore processors simultaneous multithreading (SMT) performance counters
6	Effective techniques for understanding and improving data structure usage Jung, Changhee 20 September 2013 (has links) Turing Award winner Niklaus Wirth famously noted, `Algorithms + Data Structures = Programs', and it follows that data structures should be carefully considered for effective application development. In fact, data structures are the main focus of program understanding, performance engineering, bug detection, and security enhancement, etc. Our research is aimed at providing effective techniques for analyzing and improving data structure usage in fundamentally new approaches: First, detecting data structures; identifying what data structures are used within an application is a critical step toward application understanding and performance engineering. Second, selecting efficient data structures; analyzing data structures' behavior can recognize improper use of data structures and suggest alternative data structures better suited for the current situation where the application runs. Third, detecting memory leaks for data structures; tracking data accesses with little overhead and their careful analysis can enable practical and accurate memory leak detection. Finally, offloading time-consuming data structure operations; By leveraging a dedicated helper thread that executes the operations on the behalf of the application thread, we can improve the overall performance of the application. Data structure identiﬁcation Memory graphs Interface functions Data structure selection Application generator Training framework Performance counters Data structures (Computer science)
7	PROCESSOR TEMPERATURE AND RELIABILITY ESTIMATION USING ACTIVITY COUNTERS Chhablani, Mayank 23 March 2016 (has links) With the advent of technology scaling lifetime reliability is an emerging threat in high-performance and deadline-critical systems. High on-chip thermal gradients accelerates localised thermal elevations (hotspots) which increases the aging rate of the semiconductor devices. As a result, reliable operation of the processors has become a challenging task. Therefore, cost effective schemes for estimating temperature and reliability are crucial. In this work we present a reliability estimation scheme that is based on a light-weight temperature estimation technique that monitors hardware events. Unlike previously pro- posed hardware counter-based approaches, our approach involves a linear-temporal-feedback estimator, taking into account the effects of thermal inertia. The proposed approach shows an average absolute error of We then present a counter-based technique to estimate the thermal accelerated aging factor (TAAF), which is an indicator of lifetime reliability. Results demonstrate that the estimation error is within [−3, +5]. Reliability Estimation Temperature Monitoring Performance Counters Localized hotspots Temperature Estimation Computer and Systems Architecture
8	Detection of side-channel attacks targeting Intel SGX / Detektion av attacker mot Intel SGX Lantz, David January 2021 (has links) In recent years, trusted execution environments like Intel SGX have allowed developers to protect sensitive code inside so called enclaves. These enclaves protect its code and data even in the cases of a compromised OS. However, SGX enclaves have been shown to be vulnerable to numerous side-channel attacks. Therefore, there is a need to investigate ways that such attacks against enclaves can be detected. This thesis investigates the viability of using performance counters to detect an SGX-targeting side-channel attack, specifically the recent Load Value Injection (LVI) class of attacks. A case study is thus presented where performance counters and a threshold-based detection method is used to detect variants of the LVI attack. The results show that certain attack variants could be reliably detected using this approach without false positives for a range of benign applications. The results also demonstrate reasonable levels of speed and overhead for the detection tool. Some of the practical limitations of using performance counters, particularly in an SGX-context, are also brought up and discussed. Security Trusted execution environment Intel SGX side channel attacks Load value injection detection performance counters Computer Sciences Datavetenskap (datalogi)
9	Detecting Memory-Boundedness with Hardware Performance Counters Molka, Daniel, Schöne, Robert, Hackenberg, Daniel, Nagel, Wolfgang E. 23 April 2019 (has links) Modern processors incorporate several performance monitoring units, which can be used to count events that occur within different components of the processor. They provide access to information on hardware resource usage and can therefore be used to detect performance bottlenecks. Thus, many performance measurement tools are able to record them complementary to information about the application behavior. However, the exact meaning of the supported hardware events is often incomprehensible due to the system complexity and partially lacking or even inaccurate documentation. For most events it is also not documented whether a certain rate indicates a saturated resource usage. Therefore, it is usually diffcult to draw conclusions on the performance impact from the observed event rates. In this paper, we evaluate whether hardware performance counters can be used to measure the capacity utilization within the memory hierarchy and estimate the impact of memory accesses on the achieved performance. The presented approach is based on a small selection of micro-benchmarks that constantly stress individual components in the memory subsystem, ranging from caches to main memory. These workloads are used to identify hardware performance counters that provide good estimates for the utilization of individual components in the memory hierarchy. However, since access latencies can be interleaved with computing instructions, a high utilization of the memory hierarchy does not necessarily result in low performance. We therefore also investigate which stall counters provide good estimates for the number of cycles that are actually spent waiting for the memory hierarchy. info:eu-repo/classification/ddc/004 ddc:004
10	A Dynamic Reconfiguration Framework to Maximize Performance/Power in Asymmetric Multicore Processors Annamalai, Arunachalam 01 January 2013 (has links) (PDF) Recent trends in technology scaling have shifted the processing paradigm to multicores. Depending on the characteristics of the cores, the multicores can be either symmetric or asymmetric. Prior research has shown that Asymmetric Multicore Processors (AMPs) outperform their symmetric (SMP) counterparts within a given resource and power budget. But, due to the heterogeneity in core-types and time-varying workload behavior, thread-to-core assignment is always a challenge in AMPs. As the computational requirements vary significantly across different applications and with time, there is a need to dynamically allocate appropriate computational resources on demand to suit the applications’ current needs, in order to maximize the performance and minimize the energy consumption. Performance/power of the applications could be further increased by dynamically adapting the voltage and frequency of the cores to better fit the changing characteristics of the workloads. Not only can a core be forced to a low power mode when its activity level is low, but the power saved by doing so could be opportunistically re-budgeted to the other cores to boost the overall system throughput. To this end, we propose a novel solution that seamlessly combines heterogeneity with a Dynamic Reconfiguration Framework (DRF). The proposed dynamic reconfiguration framework is equipped with Dynamic Resource Allocation (DRA) and Voltage/Frequency Adaptation (DVFA) capabilities to adapt the core resources and operating conditions at runtime to the changing demands of the applications. As a proof of concept, we illustrate our proposed approach using a dual-core AMP and demonstrate significant performance/power benefits over various baselines. Asymmetric Multicore Processors (AMP) Core Morphing Thread scheduling Performance/power prediction Hardware performance counters (HPCs) Computer Engineering Electrical and Computer Engineering

Search results