Global ETD Search

1	金融大數據與深度學習平台之設計與實作 / Design and Implementation of the Big Data in Finance and Deep Learning Platform 陳昱銘, Chen, Yu-Ming Unknown Date (has links) 本研究主旨是希望提供一個智能金融演算法交易平台，以Django CMS作為網頁框架，區分成研發環境與交易環境，完整的功能包含用戶研發、用戶測試以及使用演算法服務。用戶研發與測試上採用IPython的互動式開發介面，利用JupyterHub進行管理與配置，能夠同時提供多個用戶存取平台，使得平台足以負載大規模用戶的使用；而演算法服務經由Celery包裝成任務，以利交付給後台進行分散式運算。搭上近年來深度學習的熱潮，平台額外擴充Tensorflow套件與GPU建置，支援多核及高速演算法運算。面對存取大量、複雜且結構化的金融資料，本研究的資料庫採用HAWQ做為解決方案，利用其極大量平行化的架構，改善過往存取大數據所造成的系統複雜性與效能瓶頸，並搭配Ambari達到創建、監視及管理Hadoop分散式集群的功用，讓開發者在部署與維運上都將事半功倍。由於採用新的資料庫HAWQ，傳統的資料表設計將不利反傷，因此本研究會針對程式端存取資料庫裡的金融資料，量身打造適合的資料表設計，並對其做效能評測，以確保資料能有效且迅速地被程式所取用。 / The purpose of this research is to provide a smartly algorithmic trading platform with financial data. I use Django CMS as a web framework and consisting of Develop environment and Trade environment. The entire functions of the platform include “User Research and Development”,” User Testing” and “Algorithmic Services”. “User Research and Development” and “User Testing” using IPython interactive development interface, with JupyterHub management and configuration, can simultaneously provide multiple user accessing and make the platform enough to support more and more users; “Algorithmic Services” using Celery to package algorithms into tasks can facilitate the delivery to the Server for distributed computing. By means of the growth of Deep Learning in recent years, the platform adds extra Tensorflow and GPU deployment to support multi-core and high-speed algorithm computing. In face of accessing large number of complex and structured financial data, I choose HAWQ as the database in this research. Its extremely massively parallel processing can alleviate the complexity of system and the bottlenecks of efficiency caused by accessing massive number of data. Combing HAWQ with Ambari can achieve the functions of creation, monitoring and management of Hadoop distributed cluster. The developers will do much more easily in deployment and maintenance. The traditional table design may not fit in with the new database HAWQ, so this research will design appropriate table, and evaluate its performance to ensure that data can be accessed effectively and quickly from programs. 金融大數據深度學習極大量平行運算 FinTech Deep learning HAWQ JupyterHub Tensorflow Celery
2	基於大數據資料的非監督分散式分群演算法 / An Effective Distributed GHSOM Algorithm for Unsupervised Clustering on Big Data 邱垂暉, Chiu, Chui Hui Unknown Date (has links) 基於屬性相似度將樣本進行分群的技術已經被廣泛應用在許多領域，如模式識別，特徵提取和惡意行為偵測。由於此技術的重要性，很多人已經將各種分群技術利用分散式框架進行再製，例如K-means搭配Hadoop在Apache Mahout平台上。由於K-means需要預先定義分群數量，而自組織映射圖（SOM）需要預先定義圖的大小，所以能夠自動將樣本依照樣本間的變化容差進行分群的GHSOM（增長層次自組織映射圖）就提供了一個很棒的非監督學習方法用來針對某些資訊不完整的資料。然而，GHSOM目前並不是一個分散式的演算法，這就限制了其在大數據資料的應用上。在本篇論文中，我們提出了一種新的分散式GHSOM演算法。我們使用Scala的Actor Model來實現GHSOM的分散式系統，我們將GHSOM演算法中的水平擴增以及垂直擴增交由Actor來處理並顯示出顯著的性能提升。為了評估我們所提出的方法，我們收集並分析了數千個惡意程式在現實生活中的執行行為，並通過在數百萬個樣本上進行非監督分群後推導出惡意程式行為的檢測規則來顯示其性能的改進、規則有效性以及實踐中的潛在用法。 / Clustering techniques that group samples based on their attribute similarity have been widely used in many fields such as pattern recognition, feature extraction and malicious behavior characterization. Due to its importance, various clustering techniques have been developed with distributed frameworks such as K-means with Hadoop in Apache Mahout for scalable computation. While K-means requires the number of clusters and self organizing maps (SOM) requires the map size to be given, the technique of GHSOM (growing hierarchical self organizing maps) that clusters samples dynamically to satisfy the requirement on tolerance of variation between samples, poses an attractive unsupervised learning solution for data that have limited information to decide the number of clusters in advance. However it is not scalable with sequential computation, which limits its applications on big data. In this paper, we present a novel distributed algorithm on GHSOM. We take advantage of parallel computation with scala actor model for GHSOM construction, distributing vertical and horizontal expansion tasks to actors and showing significant performance improvement. To evaluate the presented approach, we collect and analyze execution behaviors of thousands of malware in real life and derive detection rules with the presented unsupervised clustering on millions samples, showing its performance improvement, rule effectiveness and potential usage in practice. 非監督式分群 GHSOM Actor Model 惡意程式偵測平行運算 Unsupervised clustering GHSOM Actor model Malware detection Parallel computation
3	探索類神經網路於網路流量異常偵測中的時效性需求 / Exploring the timeliness requirement of artificial neural networks in network traffic anomaly detection 連茂棋, Lian, Mao-Ci Unknown Date (has links) 雲端的盛行使得人們做任何事都要透過網路，但是總會有些有心人士使用一些惡意程式來創造攻擊或通過網絡連接竊取資料。為了防止這些網路惡意攻擊，我們必須不斷檢查網路流量資料，然而現在這個雲端時代，網路的資料是非常龐大且複雜，若要檢查所有網路資料不僅耗時而且非常沒有效率。本研究使用TensorFlow與多個圖形處理器(Graphics Processing Unit, GPU)來實作類神經網路(Artificial Neural Networks, ANN)機制，用以分析網路流量資料，並得到一個可以判斷正常與異常網路流量的偵測規則，也設計一個實驗來驗證我們提出的類神經網路機制是否符合網路流向異常偵測的時效性和有效性。在實驗過程中，我們發現使用更多的GPU可以減少訓練類神經網路的時間，並且在我們的實驗設計中使用三個GPU進行運算可以達到網路流量異常偵測的時效性。透過該方法得到的初步實驗結果，我們提出機制的結果優於使用反向傳播算法訓練類神經網路得到的結果。 / The prosperity of the cloud makes people do anything through the Internet, but there are people with bad intention to use some malicious programs to create attacks or steal information through the network connection. In order to prevent these cyber-attacks, we have to keep checking the network traffic information. However, in the current cloud environment, the network information is huge and complex that to check all the information is not only time-consuming but also inefficient. This study uses TensorFlow with multiple Graphic Processing Units (GPUs) to implement an Artificial Neural Networks (ANN) mechanism to analyze network traffic data and derive detection rules that can identify normal and malicious traffics, and we call it Network Traffic Anomaly Detection (NTAD). Experiments are also designed to verify the timeliness and effectiveness of the derived ANN mechanism. During the experiment, we found that using more GPUs can reduce training time, and using three GPUs to do the operation can meet the timeliness in NTAD. As a result of this method, the experiment result was better than ANN with back propagation mechanism. 網路流量異常偵測機器學習 GPU平行運算類神經網絡張量流 Network traffic anomaly detection Machine learning GPU parallel operation Artificial neural networks TensorFlow
4	基於 EEMD 與類神經網路方法進行台指期貨高頻交易研究 / A Study of TAIEX Futures High-frequency Trading by using EEMD-based Neural Network Learning Paradigms 黃仕豪, Huang, Sven Shih Hao Unknown Date (has links) 金融市場是個變化莫測的環境，看似隨機，在隨機中卻隱藏著某些特性與關係。不論是自然現象中的氣象預測或是金融領域中對下一時刻價格的預測, 都有相似的複雜性。時間序列的預測一直都是許多領域中重要的項目之一, 金融時間序列的預測也不例外。在本論文中我們針對金融時間序列的非線性與非穩態關係引入類神經網路(ANNs) 與集合經驗模態分解法(EEMD), 藉由ANNs處理非線性問題的能力與EEMD處理時間序列信號的優點，並進一步與傳統上使用於金融時間序列分析的自回歸滑動平均模型(ARMA)進行複合式的模型建構，引入燭型圖概念嘗試進行高頻下的台指期貨TAIEX交易。在不計交易成本的績效測試下本研究的高頻交易模型有突出的績效，證明以ANNs、EEMD方法與ARMA組成的混合式模型在高頻時間尺度交易下有相當的發展潛力，具有進一步發展的價值。在處理高頻時間尺度下所產生的大型數據方面，引入平行運算架構SPMD(single program, multiple data)以增進其處理大型資料下的運算效率。本研究亦透過分析高頻時間尺度的本質模態函數(IMFs)探討在高頻尺度下影響台指期貨價格的因素。 / Financial market is complex, unstable and non-linear system, it looks like have some principle but the principle usually have exception. The forecasting of time series always an issue in several field include finance. In this thesis we propose several version of hybrid models, they combine Ensemble Empirical Mode Decomposition (EEMD), Back-Propagation Neural Networks(BPNN) and ARMA model, try to improve the forecast performance of financial time series forecast. We also found the physical means or impact factors of IMFs under high-frequency time-scale. For processing the massive data generated by high-frequency time-scale, we pull in the concept of big data processing, adopt parallel computing method ”single program, multiple data (SPMD)” to construct the model improve the computing performance. As the result of backtesting, we prove the enhanced hybrid models we proposed outperform the standard EEMD-BPNN model and obtain a good performance. It shows adopt ANN, EEMD and ARMA in the hybrid model configure for high-frequency trading modeling is effective and it have the potential of development. 類神經網路方法燭型圖(K線圖) 自回歸滑動平均模型集合經驗模態分解法高頻交易平行運算時間序列分析大型數據處理 Artificial Neural Networks Candlestick Charts Autoregressive Moving Average model Ensemble Empirical Mode Decomposition High-Frequency Trading Parallel Computing Time series analysis Big Data Processing

1

Page generated in 0.0233 seconds