本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫,並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法,從中央通訊社中抓取國民黨以及民進黨的相關新聞文本,建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。
本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘,於這些詞中具有真正意義的詞共有28838筆,成功率可達88%。
本研究比較兩種分群方法建立主題模型,分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中,因為文本數量遠大於議題詞數量,造成TFIDF矩陣過於稀疏,造成分群效果不佳。在LDA的分群結果底下,因為LDA模型其多文章多主題共享的特性,主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本,採用LDA模型來進行議題詞分群會有較佳的表現。
本研究透過結合不同的資料時間區間,呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候,而近三次總統大選的結果顯示,相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出,不論執政黨為何,中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容,大抵不會特別偏向單一政黨 / The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results.
This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%.
In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering.
Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party.
Identifer | oai:union.ndltd.org:CHENGCHI/G0104356023 |
Creators | 吳信維, Wu, Xin-Wei |
Publisher | 國立政治大學 |
Source Sets | National Chengchi University Libraries |
Language | 中文 |
Detected Language | English |
Type | text |
Rights | Copyright © nccu library on behalf of the copyright holders |
Page generated in 0.0019 seconds