1 |
應用情感型態分析於指數股票型基金趨勢研究-以台灣卓越50基金為例 / A study on the trend of exchange traded funds by sentiment pattern analysis in Yuanta Taiwan Top 50 ETF林詠翔, Lin, Yong-Xiang Unknown Date (has links)
根據研究指出 ETF 資產規模近幾年快速成長,元大台灣卓越 50 基金因市場 規模大等優勢受到投資人的青睞,賴以巨量資料的發展使得文字探勘技術成熟, 故本研究希冀提出一套情感分析的價格預測模型,提升投資者的報酬率。
過往學者以文章中的單詞作為文字探勘的分析單位,常會產生同義詞、多義 詞的問題,因此提出情感型態分析的監督式學習方法建立模型。另外為了解決監 督式學習難以取得訓練資料的限制,本研究混合非監督式學習方法進行主題分群 與情緒傾向標注。
本研究建立台灣股市新聞文本資料集,並篩選熱門議題詞詞庫,進行非監督 式的 LDA 主題模型,發現在 2016 年總統選舉期間,媒體對於公司相關議題的注 意力降低,使得相關的文本數量大幅減少;另外在情緒傾向標注階段,因混和了 NTUSD、知網及自行擴充演算法的情感詞庫,能夠將 10%中性詞彙產生極性判 斷、96%的文本標注情緒傾向。
視覺化工具分析結果指出,DIF-MACD 能夠預測台灣卓越 50 基金的長期走 勢,而新聞情緒指數則在短期的價格波動上表現良好,且在主題模型分群中,總 體經濟、公司維運類別的新聞情緒指數具有約 1-2 日領先指標特性,對於後續的 價格預測模型有所助益。
在監督式情感分析方法,為解決上述同義詞、多義詞的問題,本研究採用型 態分類模型於中文文本,並與向量空間模型、支援向量機等方法做比較。實驗結 果指出優化的型態分類模型,並結合台灣加權股價指數,表現相對良好,F1- Measure 可達 85%。進一步討論新聞情緒對於價格預測的重要性,發現在非交易 時間序列中的新聞情緒,能夠對 0050 的價格波動產生影響。 / The past research points out that the scale of ETF assets has been growing rapidly in recent years. Yuanta Taiwan Top 50 ETF is popular with investors because of the advantages of large market scale. Through the development of Big Data, the technology of Text Mining becomes mature. Thus, we analyze the price forecast model to raise the investors' rate of return.
The research of Text Mining used to take the document term to analyze, but it often results in the problem with synonym and polysemy. Therefore, this research proposes a supervised learning method of sentiment pattern analysis. In addition, in order to solve the problem with training data about the supervised learning method, we mix the unsupervised learning method to carry out the subject grouping and sentimental tendency.
In this study, we establish the news dataset and screen it as popular terms that are used to an unsupervised method of LDA model. The result points out that the number of news about company dropped significantly during the 2016 Taiwan president election because of the change of media sensation. Moreover, we create the sentiment dictionary that can determine the polarity of 10% neutral terms and the emotional tendency of 96% documents by mixing the NTUSD, HowNet knowledge Database and the self-expansion algorithm.
Through the data visualization, the result shows that the curve of DIF-MACD is able to predict the long-term trend of 0050, while the sentiment index of the news makes a good showing in the short-term price volatility. Besides, the news sentiment index of the subjects that belong to general economy and company has about 1 to 2 day leading indicators.
Eventually, we employ the Sentiment Pattern Taxonomy Model(PTM) in Chinese texts as supervised learning method and compare with VSM and SVM. The experiment result shows that PTM combined with Taiwan Weighted Stock Index is the best when its F1-Measure is up to 85%. Apart from this, we find that the sentiment index of the news in non-trading time can influence the price volatility of 0050.
|
2 |
肺癌之研究及保單設計 / Study and price insurance for the lung cancer葉步釩, Ye, Bu Fan Unknown Date (has links)
本次研究使用全民健康保險研究資料庫2005承保抽樣歸人檔(LHID2005),共40萬人的承保資料,針對肺癌患者的特徵進行分析,並與美國國家癌症研究所的肺癌資料作比較,罹患肺癌的人數都呈現男性多於女性,罹癌年齡的最高峰同樣落在65歲至74歲。
接著,將門診處方及治療明細檔和住院醫療費用清單明細檔進行彙整,整理出肺癌患者在2005年至2012年之間的門診費用以及住院費用,並比較不同項目的差距及特徵,門診費用以用藥明細點數最高,住院花費前五名的項目為葯費、病房費、放射線診療費、檢查費以及治療處置費。
最後,建構肺癌治療的多重型態模型,治療方式包含手術治療、放射線治療、化學治療,估計不同狀態之間的轉換力,進而算出五年定期躉繳肺癌保單之純保費。 / This study used Longitudinal Health Insurance Database 2005 (LHID2005) from Taiwan’s National Health Insurance Research Database (NHIRD). Screening the 400,000 insured of NHIRD to select the lung and bronchus cancer patients. This study analyzed and described their characteristics. Furthermore, it compared Taiwan’s lung and bronchus cancer data with the data in the United States derived from National Cancer Institute of the USA. The results revealed that the number of male patients is more than female patients and lung cancer is most frequently diagnosed among people aged 65-74 in both countries.
Another aim was to sum up the lung cancer medical cost in 2005 to 2012 from NHIRD database, including ambulatory care expenditures by visits and inpatient expenditures by admissions. The highest cost of outpatients was medicine fee. The top five inpatient expenditures were medicine fee, ward fee, radiation therapy fee, inspection fee and therapeutic treatment fee.
Finally, this study constructed a multiple state model of lung cancer treatment, including surgery, radiotherapy, chemotherapy. Estimating the transition intensities from multiple state model to calculate the pure premium of a five-year lung cancer policy.
|
Page generated in 0.0246 seconds