1 |
Real-time probabilistic reasoning system using Lambda architectureAnikwue, Arinze January 2019 (has links)
Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2019 / The proliferation of data from sources like social media, and sensor devices has become overwhelming for traditional data storage and analysis technologies to handle. This has prompted a radical improvement in data management techniques, tools and technologies to meet the increasing demand for effective collection, storage and curation of large data set. Most of the technologies are open-source.
Big data is usually described as very large dataset. However, a major feature of big data is its velocity. Data flow in as continuous stream and require to be actioned in real-time to enable meaningful, relevant value. Although there is an explosion of technologies to handle big data, they are usually targeted at processing large dataset (historic) and real-time big data independently. Thus, the need for a unified framework to handle high volume dataset and real-time big data. This resulted in the development of models such as the Lambda architecture.
Effective decision-making requires processing of historic data as well as real-time data. Some decision-making involves complex processes, depending on the likelihood of events. To handle uncertainty, probabilistic systems were designed. Probabilistic systems use probabilistic models developed with probability theories such as hidden Markov models with inference algorithms to process data and produce probabilistic scores. However, development of these models requires extensive knowledge of statistics and machine learning, making it an uphill task to model real-life circumstances. A new research area called probabilistic programming has been introduced to alleviate this bottleneck.
This research proposes the combination of modern open-source big data technologies with probabilistic programming and Lambda architecture on easy-to-get hardware to develop a highly fault-tolerant, and scalable processing tool to process both historic and real-time big data in real-time; a common solution. This system will empower decision makers with the capacity to make better informed resolutions especially in the face of uncertainty.
The outcome of this research will be a technology product, built and assessed using experimental evaluation methods. This research will utilize the Design Science Research (DSR) methodology as it describes guidelines for the effective and rigorous construction and evaluation of an artefact. Probabilistic programming in the big data domain is still at its infancy, however, the developed artefact demonstrated an important potential of probabilistic programming combined with Lambda architecture in the processing of big data.
|
2 |
分散式計算系統及巨量資料處理架構設計-基於YARN, Storm及Spark / Distributed computing system and big data real-time processing structure—based on YARN, Storm and Spark曾柏崴, Tseng, Po Wei Unknown Date (has links)
近年來,隨著大數據時代的來臨,即時資料運算面臨許多挑戰。例如在期貨交易預測方面,為了精準的預測市場狀態,我們需要在海量資料中建立預測模型,且耗時在數十毫秒之內。
在本研究中,我們將介紹一套即時巨量資料運算架構,這套架構將解決在實務上需要解決的三大需求:高速處理需求、巨量資料處理以及儲存需求。同時,在整個平行運算系統之下,我們也實作了數種人工智慧演算法,例如SVM (Support Vector Machine)和LR (Logistic Regression)等,做為策略模擬的子系統。本架構包含下列三種主要的雲端運算技術:
1. 使用Apache YARN以整合整體系統資源,使叢集資源運用更具效率。
2. 為滿足高速處理需求,本架構使用Apache Storm以便處理海量且即時之資料流。同時,借助該框架,可在數十毫秒之內,運算上千種市場狀態數值供模型建模之用。
3. 運用Apache Spark,本研究建立了一套分散式運算架構用於模型建模。藉由使用Spark RDD(Resilient Distributed Datasets),本架構可將SVM和LR之模型建模時間縮短至數百毫秒之內。
為解決上述需求,本研究設計了一套n層分散式架構且整合上列數種技術。另外,在該架構中,我們使用Apache Kafka作為整體系統之訊息中介層,並支持系統內各子系統間之非同步訊息溝通。 / With the coming of the era of big data, the immediacy and the amount of data computation are facing with many challenges. For example, for Futures market forecasting, we need to accurately forecast the market state with the model built from large data (hundreds of GB to tens of TB) within tens of milliseconds.
In this research, we will introduce a real-time big data computing architecture to resolve requests of high speed processing, the immense volume of data and the request of large data processing. In the meantime, several algorithms, such as SVM (Support Vector Machine, SVM) and LR (Logistic Regression, LR), are implemented as a subproject under the parallel distributed computing system. This architecture involves three main cloud computing techniques:
1. Use Apache YARN as a system of integrated resource management in order to apply cluster resources more efficiently.
2. To satisfy the requests of high speed processing, we apply Apache Storm in order to process large real-time data stream and compute thousands of numerical value within tens of milliseconds for following model building.
3. With Apache Spark, we establish a distributed computing architecture for model building. By using Spark RDD (Resilient Distributed Datasets, RDD), this architecture can shorten the execution time to within hundreds of milliseconds for SVM and LR model building.
To resolve the requirements of the distributed system, we design an n-tier distributed architecture to integrate the foregoing several techniques. In this architecture, we use the Apache Kafka as the messaging middleware to support asynchronous message-based communication.
|
3 |
Fault-tolerant Programming Models and Computing FrameworksKurt, Mehmet Can 14 October 2015 (has links)
No description available.
|
4 |
基於 EEMD 與類神經網路方法進行台指期貨高頻交易研究 / A Study of TAIEX Futures High-frequency Trading by using EEMD-based Neural Network Learning Paradigms黃仕豪, Huang, Sven Shih Hao Unknown Date (has links)
金融市場是個變化莫測的環境,看似隨機,在隨機中卻隱藏著某些特性與關係。不論是自然現象中的氣象預測或是金融領域中對下一時刻價格的預測, 都有相似的複雜性。 時間序列的預測一直都是許多領域中重要的項目之一, 金融時間序列的預測也不例外。在本論文中我們針對金融時間序列的非線性與非穩態關係引入類神經網路(ANNs) 與集合經驗模態分解法(EEMD), 藉由ANNs處理非線性問題的能力與EEMD處理時間序列信號的優點,並進一步與傳統上使用於金融時間序列分析的自回歸滑動平均模型(ARMA)進行複合式的模型建構,引入燭型圖概念嘗試進行高頻下的台指期貨TAIEX交易。在不計交易成本的績效測試下本研究的高頻交易模型有突出的績效,證明以ANNs、EEMD方法與ARMA組成的混合式模型在高頻時間尺度交易下有相當的發展潛力,具有進一步發展的價值。在處理高頻時間尺度下所產生的大型數據方面,引入平行運算架構SPMD(single program, multiple data)以增進其處理大型資料下的運算效率。本研究亦透過分析高頻時間尺度的本質模態函數(IMFs)探討在高頻尺度下影響台指期貨價格的因素。 / Financial market is complex, unstable and non-linear system, it looks like have some principle but the principle usually have exception. The forecasting of time series always an issue in several field include finance. In this thesis we propose several version of hybrid models, they combine Ensemble Empirical Mode Decomposition (EEMD), Back-Propagation Neural Networks(BPNN) and ARMA model, try to improve the forecast performance of financial time series forecast. We also found the physical means or impact factors of IMFs under high-frequency time-scale. For processing the massive data generated by high-frequency time-scale, we pull in the concept of big data processing, adopt parallel computing method ”single program, multiple data (SPMD)” to construct the model improve the computing performance. As the result of backtesting, we prove the enhanced hybrid models we proposed outperform the standard EEMD-BPNN model and obtain a good performance. It shows adopt ANN, EEMD and ARMA in the hybrid model configure for high-frequency trading modeling is effective and it have the potential of development.
|
Page generated in 0.083 seconds