Global ETD Search

1	SIMD-Swift: Improving Performance of Swift Fault Detection Oleksenko, Oleksii 20 January 2016 (has links) (PDF) The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program\'s operations on two copies of the same data. This idea is based on the observation that SIMD is ubiquitous in modern CPUs and is usually an underutilized resource. It allows us to detect bit-flips in hardware by a simple comparison of two copies under the assumption that only one copy is affected by a fault. We implemented this idea as a source-to-source compiler which performs hardening of a program on the source code level. The evaluation of our several implementations shows that it is beneficial to use it for applications that are dominated by arithmetic or logical operations, but those that have more control-flow or memory operations are actually performing better with the regular instruction replication. For example, we managed to get only 15% performance overhead on Fast Fourier Transformation benchmark, which is dominated by arithmetic instructions, but memory-access-dominated Dijkstra algorithm has shown a high overhead of 200%. SIMD SSE Fault-Tolerance SIMD SSE Fault-Tolerance ddc:004 rvk:ST 150 rvk:ST 170
2	Effiziente externe Beobachtung von CPU-Aktivitäten auf SoCs Weiss, Alexander 28 October 2015 (has links) (PDF) Die umfassende Beobachtbarkeit von System‐on‐Chips (SoCs) ist eine wichtige Voraussetzung für das effiziente Testen und Debuggen eingebetteter Systeme. Ausgehend von einer Analyse verschiedener Anwendungsfälle ergibt sich ein Katalog von Anforderungen an die Beobachtbarkeit von SoCs. Ein wichtiges Kriterium ist hier die Vollständigkeit der Beobachtung und umfasst die Aktivitäten der CPU (ausgeführte Instruktionen, gelesene und geschriebene Daten, Verhalten des Caches, Ausführungszeiten), des Bussystems und von Umgebungsbedingungen. Weitere Kriterien sind die Echtzeitfähigkeit und die Kontinuität der Beobachtung sowie die gleichzeitige Durchführung verschiedener Beobachtungsaufgaben. Dabei soll es zu einer möglichst geringen Beeinflussung des SoCs kommen. Weitere wichtige Aspekt sind die Kosten der Lösung, die Universalität, die Skalierbarkeit sowie die Latenz der Verfügbarkeit der Beobachtungsergebnisse. Für viele Anwendungen, besonders in sicherheitskritischen Bereichen, muss zudem nachgewiesen werden, dass das Beobachtungsverfahren kein Fehlverhalten des SoCs bewirkt bzw. ein solches maskiert. Eine besondere Herausforderung stellen Multiprozessor‐SoCs (MPSoCs) dar, da hier die Kommunikation zwischen den einzelnen CPUs im Inneren des SoC stattfindet und entsprechend schwierig für einen externen Bobachter sichtbar zu machen ist. Der Stand der Technik zur Beobachtung von SoCs wird im Wesentlichen durch zwei Verfahren dargestellt. Bei der Software‐Instrumentierung wird zum funktionalen Programmcode zusätzlicher Code hinzugefügt, welcher zur Beobachtung des Programms dient. Diese Methode ist einfach und universell anwendbar, erfüllt aber die genannten Kriterien nur sehr eingeschränkt. Nachteilig ist hier der Ressourcenverbrauch im Falle des Verbleibs der Instrumentierung im fertigen Produkt. Wird die Instrumentierung nur temporär dem Code hinzugefügt, muss sichergestellt werden, dass das Beobachtungsergebnis auch für den finalen Code anwendbar ist – was besonders bei ressourcen‐abhängigen Integrationstests nur schwierig erfüllbar ist. Eine alternative Lösung stellt eine spezielle Hardware‐Unterstützung in SoCs („embedded Trace“) dar. Hier werden im SoC Zustandsinformationen (z.B. Taskwechsel, ausgeführte Instruktionen, Datentransfers) gesammelt und mittels Trace‐Nachrichten an den Beobachter übermittelt. Dabei stellt die Bandbreite, die zur Ausgabe der Trace‐Nachrichten vom SoC verfügbar ist, ein entscheidendes Nadelöhr dar ‐ im SoC sind viel mehr den Beobachter interessierende Informationen verfügbar als nach außen transferiert werden können. Damit haben beide dem gegenwärtige Stand der Technik entsprechende Beobachtungsverfahren eine Reihe von Einschränkungen, die sich besonders bei der Vollständigkeit der Beobachtung, der Flexibilität, der Kontinuität und der Unterstützung von MPSoCs zeigen. In dieser Arbeit wird nun ein neuer Ansatz vorgestellt, welcher gegenüber dem Stand der Technik in einigen Bereichen deutliche Verbesserungen bietet. Dabei werden die Trace‐Daten nicht vom zu beobachtenden SoC direkt, sondern aus einer parallel mitlaufenden Emulation gewonnen. Die Bandbreite der für die Synchronisation der Emulation erforderlichen Daten ist in vielen Fällen deutlich geringer als bei der Ausgabe von umfassenden Trace‐Nachrichten mittels „embedded Trace“‐Lösungen. Gleichzeitig ist eine vollständige, äußerst detaillierte Beobachtung der Vorgänge innerhalb des SoC möglich. Das neue Beobachtungsverfahren wurde mittels verschiedener FPGA-basierter Implementierungen evaluiert, hier konnte auch die Anwendbarkeit für MPSoCs gezeigt werden. System‐on‐Chip Debuggen Beobachtbarkeit Trace system on chip observation debugging failure ddc:004 rvk:ST 170
3	Exploiting Speculative and Asymmetric Execution on Multicore Architectures Wamhoff, Jons-Tobias 27 March 2015 (has links) (PDF) The design of microprocessors is undergoing radical changes that affect the performance and reliability of hardware and will have a high impact on software development. Future systems will depend on a deep collaboration between software and hardware to cope with the current and predicted system design challenges. Instead of higher frequencies, the number of processor cores per chip is growing. Eventually, processors will be composed of cores that run at different speeds or support specialized features to accelerate critical portions of an application. Performance improvements of software will only result from increasing parallelism and introducing asymmetric processing. At the same time, substantial enhancements in the energy efficiency of hardware are required to make use of the increasing transistor density. Unfortunately, the downscaling of transistor size and power will degrade the reliability of the hardware, which must be compensated by software. In this thesis, we present new algorithms and tools that exploit speculative and asymmetric execution to address the performance and reliability challenges of multicore architectures. Our solutions facilitate both the assimilation of software to the changing hardware properties as well as the adjustment of hardware to the software it executes. We use speculation based on transactional memory to improve the synchronization of multi-threaded applications. We show that shared memory synchronization must not only be scalable to large numbers of cores but also robust such that it can guarantee progress in the presence of hardware faults. Therefore, we streamline transactional memory for a better throughput and add fault tolerance mechanisms with a reduced overhead by speculating optimistically on an error-free execution. If hardware faults are present, they can manifest either in a single event upset or crashes and misbehavior of threads. We address the former by applying transactions to checkpoint and replicate the state such that threads can correct and continue their execution. The latter is tackled by extending the synchronization such that it can tolerate crashes and misbehavior of other threads. We improve the efficiency of transactional memory by enabling a lightweight thread that always wins conflicts and significantly reduces the overheads. Further performance gains are possible by exploiting the asymmetric properties of applications. We introduce an asymmetric instrumentation of transactional code paths to enable applications to adapt to the underlying hardware. With explicit frequency control of individual cores, we show how applications can expose their possibly asymmetric computing demand and dynamically adjust the hardware to make a more efficient usage of the available resources. Mehrkernprozessoren Multithreading Synchronisierung Transaktionaler Speicher Frequenzskalierung multicore multithreading locks speculation transactional memory dynamic voltage and frequency scaling ddc:004 rvk:ST 170
4	Effiziente Mehrkernarchitektur für eingebettete Java-Bytecode-Prozessoren Zabel, Martin 21 February 2012 (has links) (PDF) Die Java-Plattform bietet viele Vorteile für die schnelle Entwicklung komplexer Software. Für die Ausführung des Java-Bytecodes auf eingebetteten Systemen eignen sich insbesondere Java-(Bytecode)-Prozessoren, die den Java-Bytecode als nativen Befehlssatz unterstützen. Die vorliegende Arbeit untersucht detailliert die Gestaltung einer Mehrkernarchitektur für Java-Prozessoren zur effizienten Nutzung der auf Thread-Ebene ohnehin vorhandenen Parallelität eines Java-Programms. Für die Funktionalitäts- und Leistungsbewertung eines Prototyps wird eine eigene Trace-Architektur eingesetzt. Es wird eine hohe Leistungssteigerung bei nur geringem zusätzlichem Hardwareaufwand erzielt sowie eine höhere Leistung als bekannte alternative Ansätze erreicht. Java-Bytecode-Prozessor Mehrkernprozessor Eingebettetes System Java-Bytecode-Processor Multi-Core Embedded System ddc:004 rvk:ST 153 rvk:ST 170 Mehrkernprozessor Java Byte-Code Eingebettetes System
5	Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors Schöne, Robert, Ilsche, Thomas, Bielert, Mario, Molka, Daniel, Hackenberg, Daniel 24 October 2017 (has links) (PDF) Current Intel processors implement a variety of power saving features like frequency scaling and idle states. These mechanisms limit the power draw and thereby decrease the thermal dissipation of the processors. However, they also have an impact on the achievable performance. The various mechanisms significantly differ regarding the amount of power savings, the latency of mode changes, and the associated overhead. In this paper, we describe and closely examine the so-called software controlled clock modulation mechanism for different processor generations. We present results that imply that the available documentation is not always correct and describe when this feature can be used to improve energy efficiency. We additionally compare it against the more popular feature of dynamic voltage and frequency scaling and develop a model to decide which feature should be used to optimize inter-process synchronizations on Intel Haswell-EP processors. Mikroprozessoren Intel Hardware-Analyse Performance-Analyse Systemmodellierung Taktmodulation Dynamic voltage and frequency scaling Energieeffizienz Microprocessors Intel Hardware analysis Performance analysis Systems modeling Clock modulation Dynamic voltage and frequency scaling Energy efficiency ddc:004 rvk:ST 170

1

Page generated in 0.0203 seconds