1 |
High performance Monte Carlo computation for finance risk data analysisZhao, Yu January 2013 (has links)
Finance risk management has been playing an increasingly important role in the finance sector, to analyse finance data and to prevent any potential crisis. It has been widely recognised that Value at Risk (VaR) is an effective method for finance risk management and evaluation. This thesis conducts a comprehensive review on a number of VaR methods and discusses in depth their strengths and limitations. Among these VaR methods, Monte Carlo simulation and analysis has proven to be the most accurate VaR method in finance risk evaluation due to its strong modelling capabilities. However, one major challenge in Monte Carlo analysis is its high computing complexity of O(n²). To speed up the computation in Monte Carlo analysis, this thesis parallelises Monte Carlo using the MapReduce model, which has become a major software programming model in support of data intensive applications. MapReduce consists of two functions - Map and Reduce. The Map function segments a large data set into small data chunks and distribute these data chunks among a number of computers for processing in parallel with a Mapper processing a data chunk on a computing node. The Reduce function collects the results generated by these Map nodes (Mappers) and generates an output. The parallel Monte Carlo is evaluated initially in a small scale MapReduce experimental environment, and subsequently evaluated in a large scale simulation environment. Both experimental and simulation results show that the MapReduce based parallel Monte Carlo is greatly faster than the sequential Monte Carlo in computation, and the accuracy level is maintained as well. In data intensive applications, moving huge volumes of data among the computing nodes could incur high overhead in communication. To address this issue, this thesis further considers data locality in the MapReduce based parallel Monte Carlo, and evaluates the impacts of data locality on the performance in computation.
|
2 |
Uma arquitetura para processamento de grande volumes de dados integrando sistemas de workflow científicos e o paradigma mapreduceZorrilla Coz, Rocío Milagros 13 September 2012 (has links)
Submitted by Maria Cristina (library@lncc.br) on 2017-08-10T17:48:51Z
No. of bitstreams: 1
RocioZorrilla_Dissertacao.pdf: 3954121 bytes, checksum: f22054a617a91e44c59cba07b1d97fbb (MD5) / Approved for entry into archive by Maria Cristina (library@lncc.br) on 2017-08-10T17:49:05Z (GMT) No. of bitstreams: 1
RocioZorrilla_Dissertacao.pdf: 3954121 bytes, checksum: f22054a617a91e44c59cba07b1d97fbb (MD5) / Made available in DSpace on 2017-08-10T17:49:17Z (GMT). No. of bitstreams: 1
RocioZorrilla_Dissertacao.pdf: 3954121 bytes, checksum: f22054a617a91e44c59cba07b1d97fbb (MD5)
Previous issue date: 2012-09-13 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / With the exponential growth of computational power and generated data from scientific experiments and simulations, it is possible to find today simulations that generate terabytes of data and scientific experiments that gather petabytes of data. The type of processing required for this data is currently known as data-intensive computing.
The MapReduce paradigm, which is included in the Hadoop framework, is an alternative
parallelization technique for the execution of distributed applications that is being increasingly
used.
This framework is responsible for scheduling the execution of jobs in clusters, provides fault tolerance and manages all necessary communication between machines.
For many types of complex applications, the Scientific Workflow Systems offer advanced functionalities that can be leveraged for the development, execution and evaluation of scientific experiments under different computational environments.
In the Query Evaluation Framework (QEF), workflow activities are represented as algebrical
operators, and specific application data types are encapsulated in a common tuple structure. QEF aims for the automatization of computational processes and data management, supporting scientists so that they can concentrate on the scientific problem.
Nowadays, there are several Scientific Workflow Systems that provide components and task parallelization strategies on a distributed environment. However, scientific experiments tend to generate large sizes of information, which may limit the execution scalability in relation to data locality. For instance, there could be delays in data transfer for process execution or a fault at result consolidation.
In this work, I present a proposal for the integration of QEF with Hadoop. The main objective is to manage the execution of a workflow with an orientation towards data locality. In this proposal, Hadoop is responsible for the scheduling of tasks in a distributed environment, while the workflow activities and data sources are managed by QEF.
The proposed environment is evaluated using a scientific workflow from the astronomy field as a case study. Then, I describe in detail the deployment of the application in a virtualized environment. Finally, experiments that evaluate the impact of the proposed environment on the perceived performance of the application are presented, and future work discussed. / Com o crescimento exponencial do poder computacional e das fontes de geração de dados em experimentos e simulações científicas, é possível encontrar simulações que usualmente geram terabytes de dados e experimentos científicos que coletam petabytes de dados. O processamento requerido nesses casos é atualmente conhecido como computação de dados intensivos. Uma alternativa para a execução de aplicações distribuídas que atualmente é bastante usada é a técnica de paralelismo baseada no paradigma MapReduce, a qual é incluída no framework Hadoop. Esse framework se encarrega do escalonamento da execução em um conjunto de computadores (cluster), do tratamento de falhas, e do gerenciamento da comunicação necessária entre máquinas.
Para diversos tipos de aplicações complexas, os Sistemas de Gerência de Workflows Científicos (SGWf) oferecem funcionalidades avançadas que auxiliam no desenvolvimento, execução e avaliação de experimentos científicos sobre diversos tipos de ambientes computacionais. No Query Evaluation Framework (QEF), as atividades de um workflow são representadas como operadores algébricos e os tipos de dados específicos da aplicação são encapsulados em uma tupla com estrutura comum. O QEF aponta para a automatização de processos computacionais e gerenciamento de dados, ajudando os cientistas a se concentrarem no problema científico.
Atualmente, existem vários sistemas de gerência de workflows científicos que fornecem componentes e estratégias de paralelização de tarefas em um ambiente distribuído.
No entanto, os experimentos científicos apresentam uma tendência a gerar quantidades de informação que podem representar uma limitação na escalabilidade de execução em relação à localidade dos dados. Por exemplo, é possível que exista um atraso na transferência de dados no processo de execução de determinada tarefa ou uma falha no momento de consolidar os resultados.
Neste trabalho, é apresentada uma proposta de integração do QEF com o Hadoop.
O objetivo dessa proposta é realizar a execução de um workflow científico orientada a localidade dos dados. Na proposta apresentada, o Hadoop é responsável pelo escalonamento de tarefas em um ambiente distribuído, enquanto que o gerenciamento das atividades e fontes de dados do workflow é realizada pelo QEF. O ambiente proposto é avaliado utilizando um workflow científico da astronomia como estudo de caso. Logo, a disponibilização da aplicação no ambiente virtualizado é descrita em detalhe. Por fim, são realizados experimentos para avaliar o impacto do ambiente proposto no desempenho percebido da aplicação, e discutidos trabalhos futuros.
|
Page generated in 0.0342 seconds