• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 194
  • 34
  • 31
  • 16
  • 11
  • 10
  • 6
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 368
  • 134
  • 80
  • 73
  • 51
  • 45
  • 42
  • 40
  • 39
  • 36
  • 34
  • 34
  • 34
  • 32
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Compiler-Directed Error Resilience for Reliable Computing

Liu, Qingrui 08 August 2018 (has links)
Error resilience has become as important as power and performance in modern computing architecture. There are various sources of errors that can paralyze real-world computing systems. Of particular interest to this dissertation are single-event errors. They can be the results of energetic particle strike or abrupt power outage that corrupts the program states leading to system failures. Specifically, energetic particle strike is the major cause of soft error while abrupt power outage can result in memory inconsistency in the nonvolatile memory systems. Unfortunately, existing techniques to handle those single-event errors are either resource consuming (e.g., hardware approaches) or heavy-weight (e.g., software approaches). To address this problem, this dissertation identifies idempotent processing as an alternative recovery technique to handle the system failures in an efficient and low-cost manner. Then, this dissertation first proposes to design and develop a compiler-directed lightweight methodology which leverages idempotent processing and the state-of-the-art sensor-based detection to achieve soft error resilience at low-cost. This dissertation also introduces a lightweight soft error tolerant hardware design that redefines idempotent processing where the idempotent regions can be created, verified and recovered from the processor's point of view. Furthermore, this dissertation proposes a series of compiler optimizations that significantly reduce the hardware and runtime overhead of the idempotent processing. Lastly, this dissertation proposes a failure-atomic system integrated with idempotent processing to resolve another type of single-event error, i.e., failure-induced memory inconsistency in the nonvolatile memory systems. / Ph. D. / Our computing systems are vulnerable to different kinds of errors. All these errors can potentially crash real-world computing systems. This dissertation specifically addresses the challenges of single-event errors. Single-event errors can be caused by energetic particle strikes or abrupt power outage that can corrupt the program states leading to system failures. Unfortunately, existing techniques to handle those single-event errors are expensive in terms of hardware/software. To address this problem, this dissertation leverages an interesting property called idempotence in the program. A region of code is idempotent if and only if it always generates the same output whenever the program jumps back to the region entry from any execution point within the region. Thus, we can leverage the idempotent property as a low-cost recovery technique to recover the system failures by jumping back to the beginning of the region where the errors occur. This dissertation proposes solutions to incorporate the idempotent property for resilience against those single-event errors. Furthermore, this dissertation introduces a series of optimization techniques with compiler and hardware support to improve the efficiency and overheads for error resilience. We believe that our proposed techniques in this dissertation can inspire researchers for future error resilience research.
42

Entwicklung eines Modelica Compiler BackEnds für große Modelle / Development of a Modelica Compiler BackEnd for Large Models

Frenkel, Jens 03 February 2014 (has links) (PDF)
Die symbolische Aufbereitung mathematischer Modelle ist eine wesentliche Voraussetzung, um die Dynamik komplexer physikalischer Systeme mit Hilfe numerischer Simulationen zu untersuchen. Deren Modellierung mit gleichungsbasierten objektorientierten Sprachen bietet gegenüber der signalflussbasierenden Modellierung den Vorteil, dass die sich aus der Mathematik und Physik ergebenden Gleichungen in ihrer ursprünglichen Form zur Modellbeschreibung verwendet werden können. Die akausale Beschreibung mit Gleichungen erhöht die Wiederverwendbarkeit der Modelle und reduziert den Aufwand bei der Modellierung. Die automatisierte Überführung der Gleichungen in Zuweisungen lässt sich theoretisch auf Modelle beliebiger Größe und Komplexität anwenden. In der praktischen Anwendung ergeben sich jedoch, insbesondere bei der automatisierten Überführung großer Modelle, mathematische Systeme mit sehr vielen Gleichungen und zeitabhängigen Variablen. Die daraus resultierenden langen Ausführungszeiten schränken die Anwendbarkeit der symbolischen Aufbereitung erheblich ein. Die vorliegende Arbeit beschreibt den Prozess der automatisierten Überführung eines Modells bis zur numerischen Simulation. Alle Teilschritte werden detailliert untersucht und bezüglich ihrer theoretischen Komplexität bewertet. Geeignete Algorithmen werden im OpenModelica Compiler umgesetzt und bezüglich ihrer Laufzeit anhand praxisrelevanter Modelle miteinander verglichen und für jeden Teilschritt die beste Implementierung ausgewählt. Dadurch konnte ein nahezu linearer Zusammenhang zwischen Modellgröße und benötigter Zeit zur Überführung erreicht werden. Zusätzlich bietet die Arbeit eine umfassende Dokumentation mit zahlreichen Empfehlungen für die Implementierung eines BackEnds eines Modelica Compilers. Dies erleichtert den Einstieg für neue Entwickler sowie die Weiterentwicklung bestehender Algorithmen. Letzteres wird durch ein modulares Konzept einer BackEnd-Pipeline unterstützt. Außerdem werden Methoden diskutiert, wie ein neues Modul innerhalb dieser Pipeline effizient getestet werden kann. / The symbolic treatment of mathematical models is essential to study the dynamics of complex physical systems by means of numerical simulations. In contrast to signal flow based approaches, modeling with equation-based and object-oriented languages has the advantage that the original equations can be used directly. The acausal description of equations increases reusability and reduces the effort for the modeller. The automated transformation of equations into assignments can in theory be applied to models of any size and complexity. In practice, however, problems arise when large models, i.e. mathematical systems with many equations and time-dependent variables, shall be transformed. Long execution times that occur limit the applicability of symbolic processing considerably. The present work describes the process of automated transformation from a model to program code which can be simulated numerically. All steps are examined in detail and evaluated in terms of its theoretical complexity. Suitable algorithms are implemented in the OpenModelica Compiler. Their execution times are compared by looking at models which are relevant to engineering. The best implementations for each sub-step are selected and combined to a Modelica Compiler BackEnd. Thus a relationship between model size and the time needed for transformation has been achieved which is mostly linear. In addition, a comprehensive discussion with numerous recommendations for the implementation of a Modelica Compiler BackEnd is given. This is supposed to help new developers as well as to facilitate the development of existing algorithms. The latter is supported by a modular concept of a BackEnd pipeline. Moreover, methods are discussed how new modules can be tested efficiently using this pipeline.
43

Entwicklung eines Modelica Compiler BackEnds für große Modelle

Frenkel, Jens 16 January 2014 (has links)
Die symbolische Aufbereitung mathematischer Modelle ist eine wesentliche Voraussetzung, um die Dynamik komplexer physikalischer Systeme mit Hilfe numerischer Simulationen zu untersuchen. Deren Modellierung mit gleichungsbasierten objektorientierten Sprachen bietet gegenüber der signalflussbasierenden Modellierung den Vorteil, dass die sich aus der Mathematik und Physik ergebenden Gleichungen in ihrer ursprünglichen Form zur Modellbeschreibung verwendet werden können. Die akausale Beschreibung mit Gleichungen erhöht die Wiederverwendbarkeit der Modelle und reduziert den Aufwand bei der Modellierung. Die automatisierte Überführung der Gleichungen in Zuweisungen lässt sich theoretisch auf Modelle beliebiger Größe und Komplexität anwenden. In der praktischen Anwendung ergeben sich jedoch, insbesondere bei der automatisierten Überführung großer Modelle, mathematische Systeme mit sehr vielen Gleichungen und zeitabhängigen Variablen. Die daraus resultierenden langen Ausführungszeiten schränken die Anwendbarkeit der symbolischen Aufbereitung erheblich ein. Die vorliegende Arbeit beschreibt den Prozess der automatisierten Überführung eines Modells bis zur numerischen Simulation. Alle Teilschritte werden detailliert untersucht und bezüglich ihrer theoretischen Komplexität bewertet. Geeignete Algorithmen werden im OpenModelica Compiler umgesetzt und bezüglich ihrer Laufzeit anhand praxisrelevanter Modelle miteinander verglichen und für jeden Teilschritt die beste Implementierung ausgewählt. Dadurch konnte ein nahezu linearer Zusammenhang zwischen Modellgröße und benötigter Zeit zur Überführung erreicht werden. Zusätzlich bietet die Arbeit eine umfassende Dokumentation mit zahlreichen Empfehlungen für die Implementierung eines BackEnds eines Modelica Compilers. Dies erleichtert den Einstieg für neue Entwickler sowie die Weiterentwicklung bestehender Algorithmen. Letzteres wird durch ein modulares Konzept einer BackEnd-Pipeline unterstützt. Außerdem werden Methoden diskutiert, wie ein neues Modul innerhalb dieser Pipeline effizient getestet werden kann.:1 Einleitung 1.1 Modellierung und Simulation 1.2 Gleichungsbasierte objektorientierte Modellierung und Simulation 1.3 Motivation 1.4 Ziele, Forschungsschwerpunkt und Arbeitsvorhaben 2 Modellierung und Simulation mit Modelica 2.1 Anwendersicht 2.2 Die Modelica Sprache 2.3 Das Modelica Modell 2.4 Transformation des Modells zum Simulator 2.5 Laufzeituntersuchungen eines Modelica Compiler BackEnds 3 Graphen 3.1 Graphen für Gleichungssysteme 3.2 Suchalgorithmen 3.3 Matching Algorithmen 3.4 Laufzeituntersuchungen an Matching Algorithmen mit Modelica 4 Kausalisierung 4.1 Kausalisieren von algebraischen Systemen und Systemen gewöhnlicher Differentialgleichungen erster Ordnung 4.1.1 Zuweisen 4.1.2 Sortieren 4.2 Kausalisieren von differential algebraischen Systemen 4.2.1 Pantelides Algorithmus 4.2.2 Auswahl der Zustände des Systems 4.2.3 Singuläre Systeme 5 Symbolische Optimierung 5.1 Definition und Anwendung symbolischer Optimierungen 5.2 Allgemeine symbolische Optimierung 5.2.1 Finden und Entfernen trivialer Gleichungen 5.2.2 Tearing und Relaxation 5.2.3 Inline Integration 5.2.4 Parallelisierung 5.3 Fachbereichsspezifische symbolische Optimierung 5.3.1 Relaxation O(n) für Mehrkörpersysteme 6 Umsetzung Modelica Compiler BackEnd 6.1 Konzepte für die Implementation 6.1.1 Datenstruktur 6.1.2 Teilschritte der Analyse und Optimierungsphase 6.1.3 BackEnd Pipeline 6.1.4 Kausalisierung 6.2 Testen der Implementation 7 Laufzeituntersuchung an skalierbaren Modellen 7.1 Software-Benchmark Umgebung für Modelica Compiler - ModeliMark 7.2 Modelle für Laufzeituntersuchungen 7.3 Ergebnisse 8 Zusammenfassung und Ausblick / The symbolic treatment of mathematical models is essential to study the dynamics of complex physical systems by means of numerical simulations. In contrast to signal flow based approaches, modeling with equation-based and object-oriented languages has the advantage that the original equations can be used directly. The acausal description of equations increases reusability and reduces the effort for the modeller. The automated transformation of equations into assignments can in theory be applied to models of any size and complexity. In practice, however, problems arise when large models, i.e. mathematical systems with many equations and time-dependent variables, shall be transformed. Long execution times that occur limit the applicability of symbolic processing considerably. The present work describes the process of automated transformation from a model to program code which can be simulated numerically. All steps are examined in detail and evaluated in terms of its theoretical complexity. Suitable algorithms are implemented in the OpenModelica Compiler. Their execution times are compared by looking at models which are relevant to engineering. The best implementations for each sub-step are selected and combined to a Modelica Compiler BackEnd. Thus a relationship between model size and the time needed for transformation has been achieved which is mostly linear. In addition, a comprehensive discussion with numerous recommendations for the implementation of a Modelica Compiler BackEnd is given. This is supposed to help new developers as well as to facilitate the development of existing algorithms. The latter is supported by a modular concept of a BackEnd pipeline. Moreover, methods are discussed how new modules can be tested efficiently using this pipeline.:1 Einleitung 1.1 Modellierung und Simulation 1.2 Gleichungsbasierte objektorientierte Modellierung und Simulation 1.3 Motivation 1.4 Ziele, Forschungsschwerpunkt und Arbeitsvorhaben 2 Modellierung und Simulation mit Modelica 2.1 Anwendersicht 2.2 Die Modelica Sprache 2.3 Das Modelica Modell 2.4 Transformation des Modells zum Simulator 2.5 Laufzeituntersuchungen eines Modelica Compiler BackEnds 3 Graphen 3.1 Graphen für Gleichungssysteme 3.2 Suchalgorithmen 3.3 Matching Algorithmen 3.4 Laufzeituntersuchungen an Matching Algorithmen mit Modelica 4 Kausalisierung 4.1 Kausalisieren von algebraischen Systemen und Systemen gewöhnlicher Differentialgleichungen erster Ordnung 4.1.1 Zuweisen 4.1.2 Sortieren 4.2 Kausalisieren von differential algebraischen Systemen 4.2.1 Pantelides Algorithmus 4.2.2 Auswahl der Zustände des Systems 4.2.3 Singuläre Systeme 5 Symbolische Optimierung 5.1 Definition und Anwendung symbolischer Optimierungen 5.2 Allgemeine symbolische Optimierung 5.2.1 Finden und Entfernen trivialer Gleichungen 5.2.2 Tearing und Relaxation 5.2.3 Inline Integration 5.2.4 Parallelisierung 5.3 Fachbereichsspezifische symbolische Optimierung 5.3.1 Relaxation O(n) für Mehrkörpersysteme 6 Umsetzung Modelica Compiler BackEnd 6.1 Konzepte für die Implementation 6.1.1 Datenstruktur 6.1.2 Teilschritte der Analyse und Optimierungsphase 6.1.3 BackEnd Pipeline 6.1.4 Kausalisierung 6.2 Testen der Implementation 7 Laufzeituntersuchung an skalierbaren Modellen 7.1 Software-Benchmark Umgebung für Modelica Compiler - ModeliMark 7.2 Modelle für Laufzeituntersuchungen 7.3 Ergebnisse 8 Zusammenfassung und Ausblick
44

Invasive and non-invasive detection of bias temperature instability

Ahmed, Fahad 27 August 2014 (has links)
Invasive and non-invasive methods of BTI monitoring and wearout preemption have been proposed. We propose a novel, simple to use, test structure for NBTI /PBTI monitoring. The proposed structure has an AC and a DC stress mode. Although during stress mode, both PMOS and NMOS devices are stressed, the proposed structure isolates the PBTI and NBTI degradation during test mode. A methodology of converting any data-path into ring oscillator (DPRO) is also presented. To avoid the performance overhead of attaching monitoring circuitry to functional block, a non-invasive scheme for BTI monitoring is presented for sleep transistor based logic families. Since, BTI is a critical issue for memories, a scheme for BTI monitoring of 6T SRAM cell based memories is also presented. We make use of the concept of a DPRO and show how a memory system can be made to oscillate in test mode. The frequency of oscillation is a function of the devices in the cell. After validation of the proposed schemes using extensive simulations, we have also validated the results on silicon. We also introduce the concept of wearout mitigation at the compiler level. Using an example of a register file, we present a preemptive method of wearout mitigation using a compiler directed scheme.
45

An empirical study of the influence of compiler optimizations on symbolic execution

Dong, Shiyu 18 September 2014 (has links)
Compiler optimizations in the context of traditional program execution is a well-studied research area, and modern compilers typically offer a suite of optimization options. This thesis reports the first study (to our knowledge) on how standard compiler optimizations influence symbolic execution. We study 33 optimization flags of the LLVM compiler infrastructure, which are used by the KLEE symbolic execution engine. Specifically, we study (1) how different optimizations influence the performance of KLEE for Unix Coreutils, (2) how the influence varies across two different program classes, and (3) how the influence varies across three different back-end constraint solvers. Some of our findings surprised us. For example, KLEE's setting for applying the 33 optimizations in a pre-defined order provides sub-optimal performance for a majority of the Coreutils when using the basic depth-first search; moreover, in our experimental setup, applying no optimization performs better for many of the Coreutils. / text
46

Profile-driven parallelisation of sequential programs

Tournavitis, Georgios January 2011 (has links)
Traditional parallelism detection in compilers is performed by means of static analysis and more specifically data and control dependence analysis. The information that is available at compile time, however, is inherently limited and therefore restricts the parallelisation opportunities. Furthermore, applications written in C – which represent the majority of today’s scientific, embedded and system software – utilise many lowlevel features and an intricate programming style that forces the compiler to even more conservative assumptions. Despite the numerous proposals to handle this uncertainty at compile time using speculative optimisation and parallelisation, the software industry still lacks any pragmatic approaches that extracts coarse-grain parallelism to exploit the multiple processing units of modern commodity hardware. This thesis introduces a novel approach for extracting and exploiting multiple forms of coarse-grain parallelism from sequential applications written in C. We utilise profiling information to overcome the limitations of static data and control-flow analysis enabling more aggressive parallelisation. Profiling is performed using an instrumentation scheme operating at the Intermediate Representation (Ir) level of the compiler. In contrast to existing approaches that depend on low-level binary tools and debugging information, Ir-profiling provides precise and direct correlation of profiling information back to the Ir structures of the compiler. Additionally, our approach is orthogonal to existing automatic parallelisation approaches and additional fine-grain parallelism may be exploited. We demonstrate the applicability and versatility of the proposed methodology using two studies that target different forms of parallelism. First, we focus on the exploitation of loop-level parallelism that is abundant in many scientific and embedded applications. We evaluate our parallelisation strategy against the Nas and Spec Fp benchmarks and two different multi-core platforms (a shared-memory Intel Xeon Smp and a heterogeneous distributed-memory Ibm Cell blade). Empirical evaluation shows that our approach not only yields significant improvements when compared with state-of- the-art parallelising compilers, but comes close to and sometimes exceeds the performance of manually parallelised codes. On average, our methodology achieves 96% of the performance of the hand-tuned parallel benchmarks on the Intel Xeon platform, and a significant speedup for the Cell platform. The second study, addresses the problem of partially sequential loops, typically found in implementations of multimedia codecs. We develop a more powerful whole-program representation based on the Program Dependence Graph (Pdg) that supports profiling, partitioning and codegeneration for pipeline parallelism. In addition we demonstrate how this enhances conventional pipeline parallelisation by incorporating support for multi-level loops and pipeline stage replication in a uniform and automatic way. Experimental results using a set of complex multimedia and stream processing benchmarks confirm the effectiveness of the proposed methodology that yields speedups up to 4.7 on a eight-core Intel Xeon machine.
47

Optimization Techniques for Energy-Aware Memory Allocation in Embedded Systems

Levy, Renato 30 September 2004 (has links)
Degree awarded (2004): DScCS, Computer Science, George Washington University / A common practice to save power and energy in embedded systems is to "put to sleep" or disable parts of the hardware. The memory system consumes a significant portion of the energy budget of the overall system, so it is a natural target for energy optimization techniques. The principle of software locality makes the memory subsystem an even better choice, since all memory blocks but the ones immediately required can be disabled at any given time. This opportunity is the motivation for developing energy optimization techniques to dynamically and selectively control the power state of the different parts of the memory system. This dissertation develops a set of algorithms and techniques that can be organized into a hardware/software co-development tool to help designers apply the selective powering of memory blocks to minimize energy consumption. In data driven embedded systems, most of the data memory is used either by global static variables or by dynamic variables. Although techniques already exist for energy-aware allocation of global static arrays under certain constraints, very little work has focused on dynamic variables, which are actually more important to event driven/data driven embedded systems than their static counterparts. This dissertation addresses this gap, and extends and consolidates previous allocation techniques in a unique framework. A formal model for memory energy optimization for dynamic and global static variables and efficient algorithms for energy aware allocation of variables to memory are presented. Dependencies between generic code and data are uncovered, and this information is exploited to fine-tune a system. A framework is presented for retrieving this profile information which is then used to design energy aware allocation algorithms for dynamic variables, including heuristics for segmentation and control of the memory heap. By working at the assembly code level, these techniques can be integrated into any compiler regardless of the source language. The proposed techniques were implemented and tested against data intensive benchmarks, and experimental results indicate significant savings of up to 50% in the memory system energy consumption. / Advisory Committee: Professor Bhagirath Narahari, Professor Hyoeong-Ah Choi (Chair), Professor Rahul Simha, Professor Shmuel Rotenstreich, Professor Can E. Korman, Dr. Yul Williams
48

Development of the NoGAP CL Hardware Description Language and its Compiler

Blumenthal, Carl January 2007 (has links)
<p>The need for a more general hardware description language aimed specifically at processors, and vague notions and visions of how that language would be realized, lead to this thesis. The aim was to use the visions and initial ideas to evolve and formalize a language and begin implementing the tools to use it. The language, called NoGAP Common Language, is designed to give the programmer freedom to implement almost any processor design without being encumbered by many of the tedious tasks normally present in the creation process. While evolving the language it was chosen to borrow syntaxes from C++ and verilog to make the code and concepts easy to understand. The main advantages of NoGAP Common Language compared to RTL languages are;</p><p>-the ability to define the data paths of instructions separate from each other and have them merged automatically along with assigned timings to form the pipeline.</p><p>-having control paths automatically routed by activating named clauses of code coupled to control signals.</p><p>-being able to specify a decoder, where the instructions and control structures are defined, that control signals are routed to.</p><p>The implemented compiler was created with C++, Bison, and Flex and utilizes an AST structure, a symbol table, and a connection graph. The AST is traversed by several functions to generate the connection graph where the instructions of the processor can be merged into a pipeline. The compiler is in the early stages of development and much is left to do and solve. It has become clear though that the concepts of NoGAP Common Language can be implemented and are not just visions.</p> / <p>Behovet av ett mer generellt hårdvarubeskrivande språk specialiseret för processorer och visioner om ett sådant gav upphov till detta examensarbete. Målet var att utveckla visionerna, formalisera dem till ett fungerande språk och börja implementera dess verktyg. Språket, som kallas NoGAP Common Language, är designat för att ge programmeraren friheten att implementera nästan vilken processordesign som helst utan att bli nedtyngd av många av de enformiga uppgifter som annars måste utföras. Under utvecklingsprocessen valdes det att låna många syntax från C++ och verilog för att göra språket lätt att förstå och känna igen för många. De största fördelarna med att utveckla i NoGAP Common Language jämfört</p><p>med vanliga RTL språk som verilog är; </p><p>-att kunna specificera datavägar för instruktioner separat från varandra och få dem automatiskt förenade med hjälp av tidsangivelser till en pipeline.</p><p>-att få kontrollvägar automatiskt dragna genom att aktivera namngivna klausuler med kod kopplade till kontrollsignaler. </p><p>-att kunna specifiera en avkodare som kontrollvägarna kan kopplas till där</p><p>kodning för instruktioner kan anges. </p><p>Kompilatorn som implementerats med C++, Bison och Flex använder sig av en AST struktur, en symboltabell och en signalvägsgraf. AST strukturen traverseras av flera funktioner som bygger upp signalvägsgrafen där processorns instruktioner förenas till en pipeline. Utvecklingen av kompilatorn är ännu bara i de första stadierna och mycket är kvar att göra och lösa. Det har dock blivit klart att det är möjligt att implementera koncepten i NoGAP Common Language och att de inte bara är lösa visioner. </p>
49

Increasing and Detecting Memory Address Congruence

Larsen, Samuel, Witchel, Emmett, Amarasinghe, Saman P. 01 1900 (has links)
A static memory reference exhibits a unique property when its dynamic memory addresses are congruent with respect to some non-trivial modulus. Extraction of this congruence information at compile-time enables new classes of program optimization. In this paper, we present methods for forcing congruence among the dynamic addresses of a memory reference. We also introduce a compiler algorithm for detecting this property. Our transformations do not require interprocedural analysis and introduce almost no overhead. As a result, they can be incorporated into real compilation systems. On average, our transformations are able to achieve a five-fold increase in the number of congruent memory operations. We are then able to detect 95% of these references. This success is invaluable in providing performance gains in a variety of areas. When congruence information is incorporated into a vectorizing compiler, we can increase the performance of a G4 AltiVec processor up to a factor of two. Using the same methods, we are able to reduce energy consumption in a data cache by as much as 35%. / Singapore-MIT Alliance (SMA)
50

Programmer-assisted Automatic Parallelization

Huang, Diego 08 December 2011 (has links)
Parallel software is now required to exploit the abundance of threads and processors in modern multicore computers. Unfortunately, manual parallelization is too time-consuming and error-prone for all but the most advanced programmers. While automatic parallelization promises threaded software with little programmer effort, current auto-parallelizers are easily thwarted by pointers and other forms of ambiguity in the code. In this dissertation we profile the loops in SPEC CPU2006, categorize the loops in terms of available parallelism, and focus on promising loops that are not parallelized by IBM's XL C/C++ V10 auto-parallelizer. For those loops we propose methods of improved interaction between the programmer and compiler that can facilitate their parallelization. In particular, we (i) suggest methods for the compiler to better identify to the programmer the parallelization-blockers; (ii) suggest methods for the programmer to provide guarantees to the compiler that overcome these parallelization-blockers; and (iii) evaluate the resulting impact on performance.

Page generated in 0.0368 seconds