Global ETD Search

1	High performance Monte Carlo computation for finance risk data analysis Zhao, Yu January 2013 (has links) Finance risk management has been playing an increasingly important role in the finance sector, to analyse finance data and to prevent any potential crisis. It has been widely recognised that Value at Risk (VaR) is an effective method for finance risk management and evaluation. This thesis conducts a comprehensive review on a number of VaR methods and discusses in depth their strengths and limitations. Among these VaR methods, Monte Carlo simulation and analysis has proven to be the most accurate VaR method in finance risk evaluation due to its strong modelling capabilities. However, one major challenge in Monte Carlo analysis is its high computing complexity of O(n²). To speed up the computation in Monte Carlo analysis, this thesis parallelises Monte Carlo using the MapReduce model, which has become a major software programming model in support of data intensive applications. MapReduce consists of two functions - Map and Reduce. The Map function segments a large data set into small data chunks and distribute these data chunks among a number of computers for processing in parallel with a Mapper processing a data chunk on a computing node. The Reduce function collects the results generated by these Map nodes (Mappers) and generates an output. The parallel Monte Carlo is evaluated initially in a small scale MapReduce experimental environment, and subsequently evaluated in a large scale simulation environment. Both experimental and simulation results show that the MapReduce based parallel Monte Carlo is greatly faster than the sequential Monte Carlo in computation, and the accuracy level is maintained as well. In data intensive applications, moving huge volumes of data among the computing nodes could incur high overhead in communication. To address this issue, this thesis further considers data locality in the MapReduce based parallel Monte Carlo, and evaluates the impacts of data locality on the performance in computation. 650.01
2	A Parallelizing Compiler Based on Partial Evaluation Surati, Rajeev 01 July 1993 (has links) We constructed a parallelizing compiler that utilizes partial evaluation to achieve efficient parallel object code from very high-level data independent source programs. On several important scientific applications, the compiler attains parallel performance equivalent to or better than the best observed results from the manual restructuring of code. This is the first attempt to capitalize on partial evaluation's ability to expose low-level parallelism. New static scheduling techniques are used to utilize the fine-grained parallelism of the computations. The compiler maps the computation graph resulting from partial evaluation onto the Supercomputer Toolkit, an eight VLIW processor parallel computer. VLIW partial evaluation register allocation parallelsscheduling parallelizing compilers
3	A Run-Time Loop Parallelization Technique on Shared-Memory Multiprocessor Systems Wu, Chi-Fan 06 July 2000 (has links) High performance computing power is important for the current advanced calculations of scientific applications. A multiprocessor system obtains its high performance from the fact that some computations can proceed in parallel. A parallelizing compiler can take a sequential program as input and automatically translate it into parallel form for the target multiprocessor system. But when loops with arrays of irregular, nonlinear or dynamic access patterns, no any current parallelizing compiler can determine whether data dependences exist at compile-time. Thus a run-time parallel algorithm will be utilized to determine dependences and extract the potential parallelism of loops. In this thesis, we propose an efficient run-time parallelization technique to compute a proper parallel execution schedule in those loops. This new method first detects immediate predecessor iterations of each loop iteration and constructs an immediate predecessor table, then efficiently schedules the whole loop iterations into wavefronts for parallel execution. According to either theoretical analysis or experimental results, our new run-time parallelization technique reveals high speedup and low processing overhead. Furthermore, this new technique is appropriate to implement on multiprocessor systems due to the characteristics of high scalability. Run-time parallelization Parallelizing compiler Multiprocessor system Wavefront scheduling
4	A Parallelizing Compiler for Fortran Janaki, S 08 1900 (has links) With the advent of Distributed Memory Machines (DMMs) numerous work have been undertaken to ease the work of a programmer these systems. Data parallel languages like Fortran D, Vienna Fortran, High Performance Fortran and C+ allow the user to specify data distribution across processor with some directives, and the compiler for these language use the directives to compile the programme in to an SPMD code. There are number of old program which are still in use and rewriting them in to new data parallel languages is a costly effort. Most of the work on these parallelizing compilers concentrate on efficient data communication between the processors.With the advancement in technology, data communication time is also decreasing.This allows bigger programs to execute in the same time span.The resources of a DMM being finite puts a limit on the size of the problem that can be run. Improving the memory usage for a problem will hence allow us run bigger size problems. Further, as communication speed increases, the overhead caused by house-keeping computations like global index to local index transformation, and owner processor computation will degrade the performance of the resultant code. Hence a uniform and efficient method for these computations also becomes a necessity. We have implemented parallelizing parts of a compiler using the SUIF compiler system, which accepts programs written in Fortran77 with directives to the compiler as comments. The output of the compiler is an SPMD C program, with embedded PVM calls for message communication between the processors. We have also proposed algorithms to improve data communications,and minimizing memory usage in the output code. A uniform method for performing owner processor computations and global-to-local transformations has also been implemented. Computer and Information Science Parallelizing Compiler FORTRAN Compiler Distributed Memory Machines
5	SAGE: An Automatic Analyzing and Parallelizing System to Improve Performance and Reduce Energy on a New High-Performance SoC Architecture¡XProcessor-in-Memory Chu, Slo-Li 04 October 2002 (has links) Continuous improvements in semiconductor fabrication density are enabling new classes of System-on-a-Chip (SoC) architectures that combine extensive processing logic/processing with high-density memory. Such architectures are generally called Processor-in-Memory or Intelligent Memory and can support high-performance computing by reducing the performance gap between the processor and the memory. This architecture combines various processors in a single system. These processors are characterized by their computational and memory-access capabilities in performance and energy consumption. Two main problems addressed here are how to improve the performance and reduce the energy consumption of applications running on Processor-in-Memory architectures. Accordingly, a novel strategy must be developed to identify the capabilities of the different processors and dispatch the most appropriate jobs to them to exploit them fully. Accordingly, this study proposes a novel automatic source-to-source parallelizing system, called SAGE, to exploit the advantages of Processor-in-Memory architectures. Unlike conventional iteration-based parallelizing systems, SAGE adopts statement-based analytical approaches. The strategy of the SAGE system, which decomposes the original program into blocks and produces a feasible execution schedule for the host and memory processors, is also investigated. Hence, several techniques including statement splitting, weight evaluation, performance scheduling and energy reduction scheduling are designed and integrated into the SAGE system to automatically transform Fortran source programs to improve the performance of the program or reduce energy consumed by the program executed on Processor-in-Memory architecture. This thesis provides detailed techniques and discusses the experimental results of real benchmarks which are transformed by SAGE system and targeted on the Processor-in-Memory architecture. SoC Processor-in-Memory architecture statement-based automatic parallelizing compiler energy reduction
6	Efficient state space exploration for parallel test generation Ramasamy Kandasamy, Manimozhian 03 September 2009 (has links) Automating the generation of test cases for software is an active area of research. Specification based test generation is an approach in which a formal representation of a method is analyzed to generate valid test cases. Constraint solving and state space exploration are important aspects of the specification based test generation. One problem with specification based testing is that the size of the state space explodes when we apply this approach to a code of practical size. Hence finding ways to reduce the number of candidates to explore within the state space is important to make this approach practical in industry. Korat is a tool which generates test cases for Java programs based on predicates that validate the inputs to the method. Various ongoing researches intend to increase the tools effectiveness in handling large state space. Parallelizing Korat and minimizing the exploration of invalid candidates are the active research directions. This report surveys the basic algorithms of Korat, PKorat, and Fast Korat. PKorat is a parallel version of Korat and aims to take advantage of multi-processor and multicore systems available. Fast Korat implements four optimizations which reduce the number of candidate explored to generate validate candidates and reduce the amount of time required to explore each candidate. This report also presents the execution time results for generating test candidates for binary tree, doubly linked list, and sorted singly linked list, from their respective predicates. / text Automated test generation specification based testing Parallelizing Korat state space exploration java algorithms data structures MPI TACC
7	Automatic Data Partitioning By Hierarchical Genetic Search Shenoy, U Nagaraj 09 1900 (has links) CDAC / The introduction of languages like High Performance Fortran (HPF) which allow the programmer to indicate how the arrays used in the program have to be distributed across the local memories of a multi-computer has not completely unburdened the parallel programmer from the intricacies of these architectures. In order to tap the full potential of these architectures, the compiler has to perform this crucial task of data partitioning automatically. This would not only unburden the programmer but would make the programs more efficient since the compiler can be made more intelligent to take care of the architectural nuances. The topic of this thesis namely the automatic data partitioning deals with finding the best data partition for the various arrays used in the entire program in such a way that the cost of execution of the entire program is minimized. The compiler could resort to runtime redistribution of the arrays at various points in the program if found profitable. Several aspects of this problem have been proven to be NP-complete. Other researchers have suggested heuristic solutions to solve this problem. In this thesis we propose a genetic algorithm namely the Hierarchical Genetic Search algorithm to solve this problem. Computer and Information Science Genetic Search Automatic Data Partitioning Parallelizing Compiler Multiprogramming Parallel Processing Distributed Memory Multi-Computers Distributed Memory Machines Genetic Algorithms Hierarchical Genetic Search (HGS)

1

Page generated in 0.0707 seconds