• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 3
  • 2
  • 1
  • Tagged with
  • 21
  • 21
  • 10
  • 7
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Software Design of Estimation Method of Program Data Locality

Fan, Yi-Da 11 September 2008 (has links)
Data accesses consume considerable execution time during program execution. If we can improve the data access time at compile time, overall program execution time can be improved effectively. Hence, we designed a software for a data locality estimation method for estimating data locality in program optimizer. The program optimizer can then reduce the number of main memory block access and enhance the overall program performance effectively. In this research, we implemented the software design of the data locality estimation method. It includes two estimation models for estimating the number of access matches in memory blocks and for estimating the number of memory block access. We carried out experiments to verify accuracy of the locality estimation method.
2

Optimizing locality and parallelism through program reorganization

Krishnamoorthy, Sriram 07 January 2008 (has links)
No description available.
3

A Data-Locality Aware Mapping and Scheduling Framework for Data-Intensive Computing

Khanna, Gaurav 11 September 2008 (has links)
No description available.
4

Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform

Wu, Xiaolong 16 December 2015 (has links)
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordered property of row indices for partial sorting instead of full sorting of all triples according to row and column indices. Finally, through a hybrid CPU-GPU approach using two level pipelining technique, our algorithm is able to better exploit a heterogeneous system. Compared with the state-of-the-art SpMM methods in cuSPARSE and CUSP libraries, our approach achieves an average of 1.6x and 2.9x speedup separately on the nine representative matrices from University of Florida sparse matrix collection.
5

High performance Monte Carlo computation for finance risk data analysis

Zhao, Yu January 2013 (has links)
Finance risk management has been playing an increasingly important role in the finance sector, to analyse finance data and to prevent any potential crisis. It has been widely recognised that Value at Risk (VaR) is an effective method for finance risk management and evaluation. This thesis conducts a comprehensive review on a number of VaR methods and discusses in depth their strengths and limitations. Among these VaR methods, Monte Carlo simulation and analysis has proven to be the most accurate VaR method in finance risk evaluation due to its strong modelling capabilities. However, one major challenge in Monte Carlo analysis is its high computing complexity of O(n²). To speed up the computation in Monte Carlo analysis, this thesis parallelises Monte Carlo using the MapReduce model, which has become a major software programming model in support of data intensive applications. MapReduce consists of two functions - Map and Reduce. The Map function segments a large data set into small data chunks and distribute these data chunks among a number of computers for processing in parallel with a Mapper processing a data chunk on a computing node. The Reduce function collects the results generated by these Map nodes (Mappers) and generates an output. The parallel Monte Carlo is evaluated initially in a small scale MapReduce experimental environment, and subsequently evaluated in a large scale simulation environment. Both experimental and simulation results show that the MapReduce based parallel Monte Carlo is greatly faster than the sequential Monte Carlo in computation, and the accuracy level is maintained as well. In data intensive applications, moving huge volumes of data among the computing nodes could incur high overhead in communication. To address this issue, this thesis further considers data locality in the MapReduce based parallel Monte Carlo, and evaluates the impacts of data locality on the performance in computation.
6

Data Layout Optimization Techniques for Modern and Emerging Architectures

Lu, Qingda January 2008 (has links)
No description available.
7

Fast fourier transform for option pricing: improved mathematical modeling and design of an efficient parallel algorithm

Barua, Sajib 19 May 2005 (has links)
The Fast Fourier Transform (FFT) has been used in many scientific and engineering applications. The use of FFT for financial derivatives has been gaining momentum in the recent past. In this thesis, i) we have improved a recently proposed model of FFT for pricing financial derivatives to help design an efficient parallel algorithm. The improved mathematical model put forth in our research bridges a gap between quantitative approaches for the option pricing problem and practical implementation of such approaches on modern computer architectures. The thesis goes further by proving that the improved model of fast Fourier transform for option pricing produces accurate option values. ii) We have developed a parallel algorithm for the FFT using the classical Cooley-Tukey algorithm and improved this algorithm by introducing a data swapping technique that brings data closer to the respective processors and hence reduces the communication overhead to a large extent leading to better performance of the parallel algorithm. We have tested the new algorithm on a 20 node SunFire 6800 high performance computing system and compared the new algorithm with the traditional Cooley-Tukey algorithm. Option values are calculated for various strike prices with a proper selection of strike-price spacing to ensure fine-grid integration for FFT computation as well as to maximize the number of strikes lying in the desired region of the stock price. Compared to the traditional Cooley-Tukey algorithm, the current algorithm with data swapping performs better by more than 15% for large data sizes. In the rapidly changing market place, these improvements could mean a lot for an investor or financial institution because obtaining faster results offers a competitive advantages. / October 2004
8

Fast fourier transform for option pricing: improved mathematical modeling and design of an efficient parallel algorithm

Barua, Sajib 19 May 2005 (has links)
The Fast Fourier Transform (FFT) has been used in many scientific and engineering applications. The use of FFT for financial derivatives has been gaining momentum in the recent past. In this thesis, i) we have improved a recently proposed model of FFT for pricing financial derivatives to help design an efficient parallel algorithm. The improved mathematical model put forth in our research bridges a gap between quantitative approaches for the option pricing problem and practical implementation of such approaches on modern computer architectures. The thesis goes further by proving that the improved model of fast Fourier transform for option pricing produces accurate option values. ii) We have developed a parallel algorithm for the FFT using the classical Cooley-Tukey algorithm and improved this algorithm by introducing a data swapping technique that brings data closer to the respective processors and hence reduces the communication overhead to a large extent leading to better performance of the parallel algorithm. We have tested the new algorithm on a 20 node SunFire 6800 high performance computing system and compared the new algorithm with the traditional Cooley-Tukey algorithm. Option values are calculated for various strike prices with a proper selection of strike-price spacing to ensure fine-grid integration for FFT computation as well as to maximize the number of strikes lying in the desired region of the stock price. Compared to the traditional Cooley-Tukey algorithm, the current algorithm with data swapping performs better by more than 15% for large data sizes. In the rapidly changing market place, these improvements could mean a lot for an investor or financial institution because obtaining faster results offers a competitive advantages.
9

Fast fourier transform for option pricing: improved mathematical modeling and design of an efficient parallel algorithm

Barua, Sajib 19 May 2005 (has links)
The Fast Fourier Transform (FFT) has been used in many scientific and engineering applications. The use of FFT for financial derivatives has been gaining momentum in the recent past. In this thesis, i) we have improved a recently proposed model of FFT for pricing financial derivatives to help design an efficient parallel algorithm. The improved mathematical model put forth in our research bridges a gap between quantitative approaches for the option pricing problem and practical implementation of such approaches on modern computer architectures. The thesis goes further by proving that the improved model of fast Fourier transform for option pricing produces accurate option values. ii) We have developed a parallel algorithm for the FFT using the classical Cooley-Tukey algorithm and improved this algorithm by introducing a data swapping technique that brings data closer to the respective processors and hence reduces the communication overhead to a large extent leading to better performance of the parallel algorithm. We have tested the new algorithm on a 20 node SunFire 6800 high performance computing system and compared the new algorithm with the traditional Cooley-Tukey algorithm. Option values are calculated for various strike prices with a proper selection of strike-price spacing to ensure fine-grid integration for FFT computation as well as to maximize the number of strikes lying in the desired region of the stock price. Compared to the traditional Cooley-Tukey algorithm, the current algorithm with data swapping performs better by more than 15% for large data sizes. In the rapidly changing market place, these improvements could mean a lot for an investor or financial institution because obtaining faster results offers a competitive advantages.
10

Gestion hétérogène des données dans les hiérarchies mémoires pour l’optimisation énergétique des architectures multi-coeurs / Read Only Data Specific Management for an Energy Efficient Memory System

Vaumourin, Gregory 04 October 2016 (has links)
Les problématiques de consommation dans la hiérarchie mémoire sont très présentes dans les architectures actuelles que ce soit pour les systèmes embarqués limités par leurs batteries ou pour les supercalculateurs limités par leurs enveloppes thermiques. Introduire une information de classification dans le système mémoire permet une gestion hétérogène, adaptée à chaque type particulier de données. Nous nous sommes intéressé dans cette thèse plus précisément aux données en lecture seule et étudions les possibilités d’une gestion spécifique dans la hiérarchie mémoire à travers un codesign compilation/architecture. Cela permet d’ouvrir de nouveaux potentiels en terme de localité des données, passage à l’échelle des architectures ou design des mémoires. Evaluée par simulation sur une architecture multi-coeurs, la solution mise en oeuvre permet des gains significatifs en terme de réduction de la consommation d’énergie à performance constante. / The energy consumption of the memory system in modern architectures is a major issue for embedded system limited by their battery or supercalculators limited by their Thermal Design Power. Using a classification information in the memory system allows a heterogeneous management of data, more specific to each kind of data. During this thesis, we focused on the specific management of read-only data into the memory system through a compilation/architecture codesign. It allows to explore new potentials in terms of data locality, scalability of the system or cache designs. Evaluated by simulation with multi-core architecture, the proposed solution others significant energy consumption reduction while keeping the performance stable.

Page generated in 0.0961 seconds