Global ETD Search

301	Automatic Data Partitioning By Hierarchical Genetic Search Shenoy, U Nagaraj 09 1900 (has links) CDAC / The introduction of languages like High Performance Fortran (HPF) which allow the programmer to indicate how the arrays used in the program have to be distributed across the local memories of a multi-computer has not completely unburdened the parallel programmer from the intricacies of these architectures. In order to tap the full potential of these architectures, the compiler has to perform this crucial task of data partitioning automatically. This would not only unburden the programmer but would make the programs more efficient since the compiler can be made more intelligent to take care of the architectural nuances. The topic of this thesis namely the automatic data partitioning deals with finding the best data partition for the various arrays used in the entire program in such a way that the cost of execution of the entire program is minimized. The compiler could resort to runtime redistribution of the arrays at various points in the program if found profitable. Several aspects of this problem have been proven to be NP-complete. Other researchers have suggested heuristic solutions to solve this problem. In this thesis we propose a genetic algorithm namely the Hierarchical Genetic Search algorithm to solve this problem. Computer and Information Science Genetic Search Automatic Data Partitioning Parallelizing Compiler Multiprogramming Parallel Processing Distributed Memory Multi-Computers Distributed Memory Machines Genetic Algorithms Hierarchical Genetic Search (HGS)
302	Hyperplane Partitioning : An Approach To Global Data Partitioning For Distributed Memory Machines Prakash, S R 07 1900 (has links) Automatic Global Data Partitioning for Distributed Memory Machines (DMMs) is a difficult problem. Distributed memory machines are scalable, but since the memory is distributed across processors, the scheme of placement of data (arrays) onto local memories of different processors become crucial since any communication between processors for non-local data access is an order of magnitude costlier than access to local memory. Researchers have given varied solutions to this problem, most of which work for uniform dependences in loops and they suggest HPF-like distributions only. For non-uniform dependences the loop was made to run sequentially. In this work, we present a partitioning strategy called Hyperplane Partitioning which works well with loops with non-uniform dependences also. In this method of partitioning, the iteration space is partitioned into as many number of partitions as there are number of logical processors, in such a way that the overall inter-processor communication will be minimum. The idea is to localize as many as dependences as possible so that overall communication both beacuse of non-local data as well as inter-processor synchronizations are reduced. These partitions are then induced into data spaces of the arrays referenced in the loop. Each processor then runs its part of iteration space keeping the data partition that it owns locally. Any non-local data access is implemented by inter-processor communication at run-time.The Hyperplane Partitioning is also extended to a sequence of loops. This is done by first finding Best Local Distribution (BLD) for every loop first and then finding the best way of grouping different adjacent loops (just for finding the data partition) which gives best global data partition. This sequence of distributions/redistributions is found by constructing a data structure called Data Distribution Tree (DDT) and finding the least cost path from the source to any of the leaf nodes in the DDT. The costs for the edges come from the communication cost incurred while running a loop with a particular distribution and redistribution to suit the requirement at the next loop. For this a communication cost estimator is developed which works well for fewer dimensions. To handle complete programs we use some heuristic to find the best global distribution for the entire program.Some optimizations like message optimization to reduce the number of messages sent across processors, time optimization which is done by uniform scheduling across processors, and space optimization to keep only the part of array space that any processor owns onto its local memory, are studied. Hyperplane Partitioning is also implemented using an algorithm for synchronization to handle non-local memory access as well as obeying data dependence constraints. The algorithm is also proved to be correct. The target machine is IBM-SP2 using PVM for the message passing library. The performance of the tool on some standard benchmarks (ADI and RHS) and also on some programs designed by us to show the specific merits of the tool. The results show that the loops which have non-uniform dependences also can be run on DMM with good speed-ups. Computer and Information Science Parallellizing Compiler Automatic Data Partitioning Hyper-plane Partitioning Distributed Memory Machine Electronic Data Processing Multiprogramming Distributed Memory Multiprocessors Distributed Memory Multicomputers
303	Object-Oriented Development for Reconfigurable Architectures Fröhlich, Dominik 30 November 2009 (has links) (PDF) Reconfigurable hardware architectures have been available now for several years. Yet the application development for such architectures is still a challenging and error-prone task, since the methods, languages, and tools being used for development are inappropriate to handle the complexity of the problem. This thesis introduces a novel approach that tackles the complexity challenge by raising the level of abstraction to system-level and increasing the degree of automation. The approach is centered around the paradigms of object-orientation, platforms, and modeling. An application and all platforms being used for its design, implementation, and deployment are modeled with objects using UML and an action language. The application model is then transformed into an implementation, whereby the transformation is steered by the platform models. In this thesis solutions for the relevant problems behind this approach are discussed. It is shown how UML can be used for complete and precise modeling of applications and platforms. Application development is done at the system-level using a set of well-defined, orthogonal platform models. Thereby the core features of object-orientation - data abstraction, encapsulation, inheritance, and polymorphism - are fully supported. Novel algorithms are presented, that allow for an automatic mapping of such application models to the target architecture. Thereby the problems of platform mapping, estimation of implementation characteristics, and synthesis of UML models are discussed. The thesis explores the utilization of platform models for generation of highly optimized implementations in an automatic yet adaptable way. The approach is evaluated by a number of relevant applications. The execution of the generated implementations is supported by a run-time service. This service manages the hardware configurations and objects comprising the application. Moreover, it serves as broker for hardware objects. The efficient management of configurations and objects at run-time is discussed and optimized life cycles for these entities are proposed. Mechanisms are presented that make the approach portable among different physical hardware architectures. Further, this thesis presents UML profiles and example platforms that support system-level design. These extensions are embodied in a novel type of model compiler. The compiler is accompanied by an implementation of the run-time service. Both have been used to evaluate and improve the presented concepts and algorithms. CASHE Softwareentwicklung Compiler Modell Hardware Field programmable gate array Globale Optimierung Hierarchische Optimierung Modellierung Synthese Systemplattform ddc:004 rvk:ST 230 rvk:ST 190
304	Architecture and Compiler Support for Leakage Reduction Using Power Gating in Microprocessors Roy, Soumyaroop 31 August 2010 (has links) Power gating is a technique commonly used for runtime leakage reduction in digital CMOS circuits. In microprocessors, power gating can be implemented by using sleep transistors to selectively deactivate circuit modules when they are idle during program execution. In this dissertation, a framework for power gating arithmetic functional units in embedded microprocessors with architecture and compiler support is proposed. During compile time, program regions are identified where one or more functional units are idle and sleep instructions are inserted into the code so that those units can be put to sleep during program execution. Subsequently, when their need is detected during the instruction decode stage, they are woken up with the help of hardware control signals. For a set of benchmarks from the MiBench suite, leakage energy savings of 27% and 31% are achieved (based on a 70 nm PTM model) in the functional units of a processor, modeled on the ARM architecture, with and without floating point units, respectively. Further, the impact of traditional performance-enhancing compiler optimizations on the amount of leakage savings obtained with this framework is studied through analysis and simulations. Based on the observations, a leakage-aware compilation flow is derived that improves the effectiveness of this framework. It is observed that, through the use of various compiler optimizations, an additional savings of around 15% and even up to 9X leakage energy savings in individual functional units is possible. Finally,in the context of multi-core processors supporting multithreading, three different microarchitectural techniques, for different multithreading schemes, are investigated for state-retentive power gating of register files. In an in-order core, when a thread gets blocked due to a memory stall, the corresponding register file can be placed in a low leakage state. When the memory stall gets resolved, the register file is activated so that it may be accessed again. The overhead due to wake-up latency is completely hidden in two of the schemes, while it is hidden for the most part in the third. Experimental results on multiprogrammed workloads comprised of SPEC 2000 integer benchmarks show that, in an 8-core processor executing 64 threads, the average leakage savings in the register files, modeled in FreePDK 45 nm MTCMOS technology, are 42% in coarse-grained multithreading, while they are between 7% and 8% in fine-grained and simultaneous multithreading. The contributions of this dissertation represent a significant advancement in the quest for reducing leakage energy consumption in microprocessors with minimal degradation in performance. Compiler Directed Power Gating Microarchitectural Techniques Embedded Microprocessors Multithreading Multiprocessing Multicore Niagara CGMT FGMT SMT GCC SUIF MachineSUIF M5 American Studies Arts and Humanities Computer Engineering Computer Sciences
305	Ανάπτυξη αρχιτεκτονικών και τεχνικών μεταφραστών για διαχείριση μνήμης σε ενσωματωμένα συστήματα Μηλιδώνης, Αθανάσιος 21 November 2007 (has links) - / - 004.5
306	Sub-Polyhedral Compilation using (Unit-)Two-Variables-Per-Inequality Polyhedra Upadrasta, Ramakrishna 13 March 2013 (has links) (PDF) The goal of this thesis is to design algorithms that run with better complexity when compiling or parallelizing loop programs. The framework within which our algorithms operate is the polyhedral model of compilation which has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations of the above framework remain one important weakness. We address it by introducing sub-polyhedral compilation by using (Unit-)Two-Variable-Per-Inequality or (U)TVPI Polyhedra, namely polyhedrawith restricted constraints of the type ax_{i}+bx_{j}\le c (\pm x_{i}\pm x_{j}\le c). A major focus of our sub-polyhedral compilation is the introduction of sub-polyhedral scheduling, where we propose a technique for scheduling using (U)TVPI polyhedra. As part of this, we introduce algorithms that can be used to construct under-aproximations of the systems of constraints resulting from affine scheduling problems. This technique relies on simple polynomial time algorithms to under approximate a general polyhedron into (U)TVPI polyhedra. The above under-approximation algorithms are generic enough that they can be used for many kinds of loop parallelization scheduling problems, reducing each of their complexities to asymptotically polynomial time. We also introduce sub-polyhedral code-generation where we propose algorithms to use the improved complexities of (U)TVPI sub-polyhedra in polyhedral code generation. In this problem, we show that the exponentialities associated with the widely used polyhedral code generators could be reduced to polynomial time using the improved complexities of (U)TVPI sub-polyhedra. The above presented sub-polyhedral scheduling techniques are evaluated in an experimental framework. For this, we modify the state-of-the-art PLuTo compiler which can parallelize for multi-core architectures using permutation and tiling transformations. We show that using our scheduling technique, the above under-approximations yield polyhedra that are non-empty for 10 out of 16 benchmarks from the Polybench (2.0) kernels. Solving the under-approximated system leads to asymptotic gains in complexity, and shows practically significant improvements when compared to a traditional LP solver. We also verify that code generated by our sub-polyhedral parallelization prototype matches the performance of PLuTo-optimized code when the under-approximation preserves feasibility. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Affine scheduling Approximation algorithms Compiler optimizations Compilers Loop transformations Optimization Parallelism Asymptotic complexity Code generation
307	JCML - Java Card Modeling Language: Defini??o e Implementa??o Souza Neto, Pl?cido Ant?nio de 06 September 2007 (has links) Made available in DSpace on 2014-12-17T15:47:43Z (GMT). No. of bitstreams: 1 PlacidoASN.pdf: 652214 bytes, checksum: b7912104bf8e3ec91262c75b9ef5d36b (MD5) Previous issue date: 2007-09-06 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / Formal methods should be used to specify and verify on-card software in Java Card applications. Furthermore, Java Card programming style requires runtime verification of all input conditions for all on-card methods, where the main goal is to preserve the data in the card. Design by contract, and in particular, the JML language, are an option for this kind of development and verification, as runtime verification is part of the Design by contract method implemented by JML. However, JML and its currently available tools for runtime verification were not designed with Java Card limitations in mind and are not Java Card compliant. In this thesis, we analyze how much of this situation is really intrinsic of Java Card limitations and how much is just a matter of a complete re-design of JML and its tools. We propose the requirements for a new language which is Java Card compliant and indicate the lines on which a compiler for this language should be built. JCML strips from JML non-Java Card aspects such as concurrency and unsupported types. This would not be enough, however, without a great effort in optimization of the verification code generated by its compiler, as this verification code must run on the card. The JCML compiler, although being much more restricted than the one for JML, is able to generate Java Card compliant verification code for some lightweight specifications. As conclusion, we present a Java Card compliant variant of JML, JCML (Java Card Modeling Language), with a preliminary version of its compiler / M?todos formais poderiam ser usados para especificar e verificar software on-card em aplica??es Java Card. O estilo de programa??o para smart cards requer verifica??o em tempo de execu??o para condi??es de entrada em todos os m?todos Java Card, onde o objetivo principal ? preservar os dados do cart?o. Projeto por Contrato, em particular, a linguagem JML, ? uma op??o para este tipo de desenvolvimento e verifica??o, pelo fato da verifica??o em tempo de execu??o ser parte da implementa??o pela JML. Contudo, JML e suas respectivas ferramentas para verifica??o em tempo de execu??o n?o foram projetadas com o foco nas limita??es Java Card, sendo, dessa forma, n?o compat?veis com Java Card. Nesta disserta??o, analisamos o quanto esta situa??o ? realmente intr?nseca ?s limita??es Java Card e, se ? poss?vel re-definir a JML e suas ferramentas. Propomos requisitos para uma nova linguagem, a qual ? compat?vel com Java Card e apresentamos como o compilador desta linguagem pode ser constru?do. JCML retira da JML aspectos n?o definidos em Java Card, como por exemplo, concorr?ncia e tipos n?o suportados. Isto pode n?o ser o bastante, contudo, sem o esfor?o em otimiza??o de c?digo de verifica??o gerado pelo compilador, n?o ? poss?vel gerar c?digo de verifica??o para rodar no cart?o. O compilador JCML, apesar de ser bem mais restrito em rela??o ao compilador JML, est? habilitado a gerar c?digo de verifica??o compat?vel com Java Card, para algumas especifica??es lightweight. Como conclus?o, apresentamos uma variante da JML compat?vel com Java Card, JCML (Java Card Modeling Language), com uma vers?o de seu compilador M?todos Formais Java Card JML JCML Verifica??o Runtime Compilador Formal Methods Java Card JML JCML Runtime Verification Compiler
308	A network transparent, retained mode multimedia processing framework for the Linux operating system environment Bahmann, Helge 27 July 2009 (has links) (PDF) Die Arbeit präsentiert ein Multimedia-Framework für Linux, das im Unterschied zu früheren Arbeiten auf den Ideen "retained-mode processing" und "lazy evaluation" basiert: Statt Transformationen unmittelbar auszuführen, wird eine abstrakte Repräsentation aller Medienelemente aufgebaut. "renderer"-Treiber fungieren als Übersetzer, die diese Darstellung zur Laufzeit in konkrete Operationen umsetzen, wobei das Datenmodell zahlreiche Optimierungen zur Reduktion der Anzahl der Schritte oder der Minimierung von Kommunikation erlaubt. Dies erlaubt ein stark vereinfachtes Programmiermodell bei gleichzeitiger Effizienzsteigerung. "renderer"-Treiber können zur Ausführung von Transformationen den lokalen Prozessor verwenden, oder können die Operationen delegieren. In der Arbeit wird eine Erweiterung des X Window Systems um Mechanismen zur Medienverarbeitung vorgestellt, sowie ein "renderer"-Treiber, der diese zur Delegation der Verarbeitung nutzt. Multimedia X Window System Verteiltes System Framework <Informatik> Betriebssystem LINUX Just-in-Time-Compiler Szenengraph ddc:004 rvk:ST 325 rvk:ST 261
309	Die C# Schnittstelle der Referenzattributgrammatik-gesteuerten Graphersetzungsbibliothek RACR: Übersicht, Anwendung und Implementierung Langner, Daniel, Bürger, Christoff 04 July 2018 (has links) (PDF) Dieser Bericht präsentiert RACR-NET, eine Schnittstelle der Referenzattributgrammatik-gesteuerten Graphersetzungsbibliothek RACR für C#. RACR-NET ermöglicht die Nutzung der deklarativen, dynamischen Sprachspezifikations-, Instanziierungs- und Auswertungsmeachanismen der RACR Scheme-Bibliothek in der objektorientierten Programmierung. Dies umfasst insbesondere die automatische inkrementelle Auswertung attributbasierter semantischer Analysen und somit das automatische Cachen parametrisierter Funktionsmethoden. Graphersetzungen entsprechen hierbei Zustandsänderungen von Objektinstanzen und der Invalidierung abgeleiteter Berechnungen. Schwerpunkt dieses Berichts ist die objektorientierte Programmierschnittstelle von RACR-NET, dessen praktische Anwendung und Implementierung. Der Bericht ist ein Referenzhandbuch für RACR-NET Anwender und Entwickler. Attributierte Grammatik Graphersetzungssystem Termersetzungssystem Inkrementelle Attributauswertung Übersetzerbau Informatik Attribute Grammar Reference Attribute Grammar Graph Rewriting Term Rewriting Incremental Attribute Evaluation Compiler Construction Computer Science ddc:004 rvk:SS 5514
310	Effects of Error Messages on a Student’s Ability to Understand and Fix Programming Errors January 2017 (has links) abstract: Assemblers and compilers provide feedback to a programmer in the form of error messages. These error messages become input to the debugging model of the programmer. For the programmer to fix an error, they should first locate the error in the program, understand what is causing that error, and finally resolve that error. Error messages play an important role in all three stages of fixing of errors. This thesis studies the effects of error messages in the context of teaching programming. Given an error message, this work investigates how it effects student’s way of 1) understanding the error, and 2) fixing the error. As part of the study, three error message types were developed – Default, Link and Example, to better understand the effects of error messages. The Default type provides an assembler-centric single line error message, the Link type provides a program-centric detailed error description with a hyperlink for more information, and the Example type provides a program centric detailed error description with a relevant example. All these error message types were developed for assembly language programming. A think aloud programming exercise was conducted as part of the study to capture the student programmer’s knowledge model. Different codes were developed to analyze the data collected as part of think aloud exercise. After transcribing, coding, and analyzing the data, it was found that the Link type of error message helped to fix the error in less time and with fewer steps. Among the three types, the Link type of error message also resulted in a significantly higher ratio of correct to incorrect steps taken by the programmer to fix the error. / Dissertation/Thesis / Masters Thesis Software Engineering 2017 Computer science Educational technology Science education Compiler & Assembler Error Messages Computer Education Progressive Learning Platform Qualitative Analysis Think Aloud Programming Verbal Protocol Analysis

Search results