近年來ETF規模快速成長,亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源,而元大台灣50指數型證券投資信託基金因規模大,受到投資人的青睞。根據過去的研究指出,網路上的文本訊息會對群眾情緒造成影響,進而影響股價波動,對投資者而言,若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢,勢必可提高報酬率。然而,每日有上百篇的財金文本產生,人工分析耗時耗力,本研究採用文字探勘技術,提出一套情感分析的價格預測模型。
過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,然而,為解決監督式學習無法預期未知的限制,本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別,計算情緒指數並標記文本情緒傾向,再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等,建立分類模型以預測元大台灣50ETF的價格趨勢。
實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏,使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性,使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面,證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。
本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率,故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet.
In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index.
According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only.
Identifer | oai:union.ndltd.org:CHENGCHI/G0103356022 |
Creators | 黃泓銘, Huang, Hung-Ming |
Publisher | 國立政治大學 |
Source Sets | National Chengchi University Libraries |
Language | 中文 |
Detected Language | English |
Type | text |
Rights | Copyright © nccu library on behalf of the copyright holders |
Page generated in 0.002 seconds