Return to search

應用kNN文字探勘技術於分析新聞評論 影響股價漲跌趨勢之研究 / The Study of Analyzing Comments of News for Influence of Stock Price Trends Prediction by Using Knn Text Mining

在網際網路快速發展下,大量使用者在獲取知識與新聞的管道,已由傳統媒體轉移到網路上。網路活動下使用者互動後所留下的訊息,也就是網路口碑,也逐漸受到重視。而隨著經濟發展,國人在固定薪資下無法負擔高房價、高物價的生活,如何透過投資理財來增加自身財富,已是非常普遍,其中又以股市投資為大眾所重視之途徑。

網路新聞的發布,除了具有網路的即時性外,配合使用者閱讀內化後所留下的評論,應含有比網路新聞本身內容更多的資訊,投資者便可藉此找尋隱含之中大量市場消息與資訊。

本研究為了在龐大的資料量中,幫助使用者挖掘其背後之涵義,進而提供投資預測,將蒐集網路新聞及其閱讀者評論共1068篇,並分為訓練資料與測試資料,使用文字探勘及相關技術做前處理,再透過kNN分群技術,計算訓練資料文件間相似度,將大量未知資料依其相似度做分群後,利用歷史股價訊息對群集結果之特徵分析解釋之並建立預測模型,最後透過測試資料將模型分群結果進行評估,進而對股價趨勢做出預測。 / With the rapid development of the Internet, the way of user access to knowledge and news transfer from traditional media to the network. Internet word-of-mouth, the message generated from users' interaction on internet, attracts more and more people's attention. With economic development, people in the fixed salary cannot afford high prices and high price in live. People increase their own wealth through investment is very common, among which the stock market is the way to public attention.

Internet news has the immediacy of the Internet. And the comments left with the user to read the internalization should contain more information than the Internet news. Investors can find the market news and information by Internet news and comments.

In this study, in order to help the user to find the meaning behind the huge amount of data, and thus provide investment forecast. We will collect 1068 of internet news and reader reviews to divide into training data and test data using text mining and related technologies to do the pre-treatment, and then calculate the similarity between the training data by kNN, a lot of unknown data according to their similarity clustering. Cluster through the historical share price analysis and modeling. Finally, the model clustering results were evaluated through the test data to predict price trends. The prediction model from training data clustering, use test data to do the evaluation found: k = 15, the similarity threshold value = 0.05, cluster the results of the F-measure performance up to 56% rise in the cluster. K values and the similarity threshold will be adjusted to obtain the most favorable results of the model

Identiferoai:union.ndltd.org:CHENGCHI/G0100356044
Creators詹智勝, Chan, Chih Sheng
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0018 seconds