Global ETD Search

1	應用情感分析於媒體新聞傾向之研究-以中央社為例 / Applying sentiment analysis to the tendency of media news: a case study of central news agency 吳信維, Wu, Xin-Wei Unknown Date (has links) 本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫，並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法，從中央通訊社中抓取國民黨以及民進黨的相關新聞文本，建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。　　本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘，於這些詞中具有真正意義的詞共有28838筆，成功率可達88%。　　本研究比較兩種分群方法建立主題模型，分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中，因為文本數量遠大於議題詞數量，造成TFIDF矩陣過於稀疏，造成分群效果不佳。在LDA的分群結果底下，因為LDA模型其多文章多主題共享的特性，主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本，採用LDA模型來進行議題詞分群會有較佳的表現。　　本研究透過結合不同的資料時間區間，呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候，而近三次總統大選的結果顯示，相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出，不論執政黨為何，中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容，大抵不會特別偏向單一政黨 / The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results. 　　This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%. 　　In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering. 　　Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party. 情感分析 LDA主題模型 n-gram a-priori Sentiment analysis LDA N-gram A-priori
2	股市趨勢預測之研究 -財經評論文本情感分析 / Predict the trend in the stock by Sentiment analyzing financial posts 蔡宇祥, Tsai, Yu Shiang Unknown Date (has links) 根據過去研究指出，社群網站上的貼文訊息會對群眾情緒造成影響，進而影響股市波動，故對於投資者而言，如果能快速分析大量社群網站的財經文本來推測投資情緒進而預測股市走勢，將可提升投資獲利。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別，故其有無法預期未知類別的限制，所以本研究透過深度學習方法，從巨量資料及裡抓出有關於股市之文章，並透過財經文本的混合監督式學習與非監督式學習之情感分析方法，透過非監督式學習對微博財經貼文進行文本主題判別、情緒指數計算與情緒傾向標注，並且透過監督式學習的方式，建立分類模型以預測上海指數走勢，最後配合視覺化工具作趨勢線圖分析，找出具有領先指標特性之主題。在實驗結果中，深度學習方面，本研究透過word2vec抓取有效之股市主題文章，有效篩選了需要分析之文本，主題模型方面，我們最後使用LDA作為本研究標註主題之方法，因為其文本數量大於議題詞數量造成TFIDF矩陣過於稀疏，造成Kmeans分群效果不佳，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，透過擴充後的情感詞集比起NTUSD有更好的詞性分數判斷效果，計算出的情緒指數之趨勢線能有效預測上海指數之趨勢。此外，並非所有主題模型之情緒指數皆具有領先特性，僅公司表現與上海指數之主題模型的情緒指數能提前反應上海指數趨勢，故本研究用此二主題之文本的情緒指數來建立分類模型。本研究透過比較情緒指數與單純指數指標分類模型的準確度，前者較後者高出7%的準確率。故證實了情感分析確實能有效提升上海指數趨勢預測準確度，幫助投資者增加股市報酬率。情感分析 Word2vec LDA主題模型 K-means 上海股價指數
3	應用情感分析於指數型證券投資信託基金趨勢預測之研究 / Research into sentimental analysis to predict exchange-traded fund trend 黃泓銘, Huang, Hung-Ming Unknown Date (has links) 近年來ETF規模快速成長，亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源，而元大台灣50指數型證券投資信託基金因規模大，受到投資人的青睞。根據過去的研究指出，網路上的文本訊息會對群眾情緒造成影響，進而影響股價波動，對投資者而言，若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢，勢必可提高報酬率。然而，每日有上百篇的財金文本產生，人工分析耗時耗力，本研究採用文字探勘技術，提出一套情感分析的價格預測模型。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，然而，為解決監督式學習無法預期未知的限制，本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別，計算情緒指數並標記文本情緒傾向，再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等，建立分類模型以預測元大台灣50ETF的價格趨勢。實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏，使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現，前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率，故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only. 情感分析 LDA主題模型支援向量機 ETF Sentimental analysis LDA SVM ETF
4	應用情感型態分析於指數股票型基金趨勢研究-以台灣卓越50基金為例 / A study on the trend of exchange traded funds by sentiment pattern analysis in Yuanta Taiwan Top 50 ETF 林詠翔, Lin, Yong-Xiang Unknown Date (has links) 根據研究指出 ETF 資產規模近幾年快速成長，元大台灣卓越 50 基金因市場規模大等優勢受到投資人的青睞，賴以巨量資料的發展使得文字探勘技術成熟，故本研究希冀提出一套情感分析的價格預測模型，提升投資者的報酬率。過往學者以文章中的單詞作為文字探勘的分析單位，常會產生同義詞、多義詞的問題，因此提出情感型態分析的監督式學習方法建立模型。另外為了解決監督式學習難以取得訓練資料的限制，本研究混合非監督式學習方法進行主題分群與情緒傾向標注。本研究建立台灣股市新聞文本資料集，並篩選熱門議題詞詞庫，進行非監督式的 LDA 主題模型，發現在 2016 年總統選舉期間，媒體對於公司相關議題的注意力降低，使得相關的文本數量大幅減少;另外在情緒傾向標注階段，因混和了 NTUSD、知網及自行擴充演算法的情感詞庫，能夠將 10%中性詞彙產生極性判斷、96%的文本標注情緒傾向。視覺化工具分析結果指出，DIF-MACD 能夠預測台灣卓越 50 基金的長期走勢，而新聞情緒指數則在短期的價格波動上表現良好，且在主題模型分群中，總體經濟、公司維運類別的新聞情緒指數具有約 1-2 日領先指標特性，對於後續的價格預測模型有所助益。在監督式情感分析方法，為解決上述同義詞、多義詞的問題，本研究採用型態分類模型於中文文本，並與向量空間模型、支援向量機等方法做比較。實驗結果指出優化的型態分類模型，並結合台灣加權股價指數，表現相對良好，F1- Measure 可達 85%。進一步討論新聞情緒對於價格預測的重要性，發現在非交易時間序列中的新聞情緒，能夠對 0050 的價格波動產生影響。 / The past research points out that the scale of ETF assets has been growing rapidly in recent years. Yuanta Taiwan Top 50 ETF is popular with investors because of the advantages of large market scale. Through the development of Big Data, the technology of Text Mining becomes mature. Thus, we analyze the price forecast model to raise the investors' rate of return. The research of Text Mining used to take the document term to analyze, but it often results in the problem with synonym and polysemy. Therefore, this research proposes a supervised learning method of sentiment pattern analysis. In addition, in order to solve the problem with training data about the supervised learning method, we mix the unsupervised learning method to carry out the subject grouping and sentimental tendency. In this study, we establish the news dataset and screen it as popular terms that are used to an unsupervised method of LDA model. The result points out that the number of news about company dropped significantly during the 2016 Taiwan president election because of the change of media sensation. Moreover, we create the sentiment dictionary that can determine the polarity of 10% neutral terms and the emotional tendency of 96% documents by mixing the NTUSD, HowNet knowledge Database and the self-expansion algorithm. Through the data visualization, the result shows that the curve of DIF-MACD is able to predict the long-term trend of 0050, while the sentiment index of the news makes a good showing in the short-term price volatility. Besides, the news sentiment index of the subjects that belong to general economy and company has about 1 to 2 day leading indicators. Eventually, we employ the Sentiment Pattern Taxonomy Model(PTM) in Chinese texts as supervised learning method and compare with VSM and SVM. The experiment result shows that PTM combined with Taiwan Weighted Stock Index is the best when its F1-Measure is up to 85%. Apart from this, we find that the sentiment index of the news in non-trading time can influence the price volatility of 0050. 情感分析 LDA主題模型型態模型指數股票型基金 Sentimental analysis LDA Pattern model ETF
5	運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究 / Research of applying Sentimental Analysis on financial documents to predict Taiwan Electronic Sub-Index trend 劉羿廷 Unknown Date (has links) 電子工業為台灣最具競爭力之產業,使得電子類股在集中市場成交比重高達 69.49%,可見電子類股的波動足以對整個台股市場造成相當大的影響。而許多研究指出,網路上的文本訊息藉由社會網路的催化而快速傳遞,會對群眾情緒造成影響,進而影響股價波動,故對於投資者而言,如果能快速分析大量網路財經文本來推測投資大眾情緒進而預測股價走勢,即可提升獲利。然而,每天有近百篇的財經文本產生,傳統的人工抽樣分析方式效率不彰且過於耗力, 已不足以負荷此巨量資料。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別,故其有無法預期未知類別的限制,造成無法判斷文本中可能存在的未知主題,所以本研究提出一套針對財經文本的混合監督式學習與非監督式學習之情感分析方法,透過非監督式學習將 2014 整年度的電子工業財經文本進行文本主題判別、情緒指數計算與情緒傾向標注。之後配合視覺化工具作趨勢線圖分析,找出具有領先指標特性之主題,接著再用監督式學習將其結合國際指標、總體經濟指標、台股指標、技術指標等,建立分類模型以預測台灣電子類股價指數走勢。在實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成 TFIDF 矩陣過於稀疏,使得 TFIDF-Kmeans 主題模型分類效果不佳;而文本具有多主題之特性造成 NPMI-Concor 分群之議題詞過於複雜不易歸納,然而LDA 主題模型基於所有主題被所有文章共享的特性,使得在字詞分群與主題分類準確度都優於 TFIDF-Kmeans 和 NPMI-Concor 主題模型,分類準確度高達 98%,故後續採用 LDA 主題模型進行主題標注。情緒傾向標注方面,證實本研究擴充後的情感詞集比起 NTUSD 有更好的字詞極性判斷效果,計算出的情緒指數之趨勢線也較投資人常用的 MACD 之趨勢線更符合電子類股價指數之趨勢。此外,亦發現並非所有文本的情緒指數皆具有領先特性,僅企業營運主題與總體經濟主題之文本的情緒指數能提前反應電子類股價指數趨勢,故本研究用此二主題之文本的情緒指數來建立分類模型。接著,本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出 7%的準確率。進一步結合間接情緒指標的分類模型更有高達 71%準確率,故證實了情感分析確實能有效提升電子股價類股指數趨勢預測準確度,以提升投資人之投資報酬率。 / The electronic industry is the most competitive industry in Taiwan, and its large volume could have strong influence on the whole stock market. Many research show that text documents on the Internet have great effect on public emotion, and the public emotion could also affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents then use this information to predict the stock trend. However, the traditional way to analyze text documents by human resource cannot afford the large volume of financial text documents on the Internet. In past Sentimental Analysis research, supervised method is proven as a method could reach high accuracy, but there are limits about predicting the future trend. This research found a solution which mixed supervised and unsupervised methods to deal with these large financial text documents. First, we use unsupervised method to find out the topic of documents, and then calculate the sentimental index to judge the document’s emotional direction. After that we will produce trend line charts by visualization tools to find out which theme documents’ sentiment index are leading indicators. Furthermore, we use supervised method to integrate the sentimental index with other 24 indirect sentimental index to build the prediction model. According to the result, we found that LDA model’s performance is better than TFIDF-Kmeans model and NPMI-Concor mode because of document characteristic. Besides, sentimental dictionary I build has higher accuracy than NTUSD on judging word polarity. The trend of sentimental index and Taiwan electronic sub-index(TE) to each other is more similar than MACD line and TE to each other. We also discover that the sentiment index produced from documents about enterprise operation and macroeconomics are leading indicators, so we use these to build prediction model. Moreover, we found that the prediction model which include the sentiment index better than which only include the technical indicators. As mentioned above, the sentimental index could make the prediction of Taiwan electronic sub-index trend be more accurate and promote the return of investment. 情感分析巨量資料 LDA 主題模型支援向量機電子類股價指數 Sentimental analysis Big Data LDA SVM Taiwan Electronic Sub-Index Trend
6	以推敲可能性模式探討影響評論幫助性之因素 / Factors Affecting Review Helpfulness : An Elaboration Likelihood Model Perspective 熊耿得, Hsiung, Keng-Te Unknown Date (has links) 在電子商務中，評論會影響消費者的購買決策，透過評論幫助性可以篩選出關鍵的評論，以利消費者進行決策。本研究以推敲可能性模式作為研究架構，透過文字探勘挖掘評論的文本特性來探討影響幫助性之要素，中央線索除了評論長度與可讀性外，利用LDA主題模型衡量評論主題廣度；周邊線索則是透過環狀情緒模型進行情感分析，並透過評論者排名來衡量來源可信度，利用亞馬遜商店中的資料進行驗證分析。結果發現，消費者在判斷評論幫助性時，會參考中央以及周邊線索。具備高論點品質的中央線索將有效提升評論幫助性；周邊線索整體而言，證實了社會中存在負向偏誤，具備喚起度的負向情感較容易提升評論幫助性，而評論是否被認為有幫助確實會受到評論者的排名所影響。進階分析結果顯示，周邊的情感效果會受到評論者排名高低的影響，前段評論者應保持中立避免帶有個人情緒；中段評論者的評論幫助性會隨著情緒喚起度而增加；後段評論者則需要增加自身的負向情感，才能夠對於評論幫助性有正向影響。 / Online reviews are important factors in consumers’ purchase decision. The helpfulness of reviews allows consumers to quickly identify useful reviews. The purpose of this study is to investigate the nature of online reviews that affect their helpfulness through the lens of the elaboration likelihood model. For the central cues, we adopt latent dirichlet allocation to measure review breadth in addition to review length and review readability. For the peripheral cues, we use the sentiment analysis based on the circumplex model to catch the emotion effect and use the ranking of the reviewers to measure the source credibility. We used a dataset collected from Amazon.com to evaluate our model. The result suggests that consumers focus both central and peripheral cues when they read reviews. Consumers care about the length, breadth and readability of reviews associated with the central route, and the emotional effects associated with the peripheral route. In the advanced research, we split our sample into 3 groups by their ranking of the reviewers. We found that the top reviewers should keep neutral and avoid personal feelings to make their reviews more helpful; the middle reviewers can use more arousal words to improve their review helpfulness; the bottom reviewers must increase their emotional valence strength, especially the negative emotion to higher the perceived review helpfulness. 評論幫助性推敲可能性模式 LDA主題模型環狀情緒模型情感分析 Review helpfulness Elaboration likelihood model Latent dirichlet allocation Circumplex model Sentiment analysis
7	應用文本主題與關係探勘於多文件自動摘要方法之研究：以電影評論文章為例 / Application of text topic and relationship mining for multi-document summarization: using movie reviews as an example 林孟儀 Unknown Date (has links) 由於網際網路的普及造成資訊量愈來愈大，在資訊的搜尋、整理與閱讀上會耗費許多時間，因此本研究提出一應用文本主題及關係探勘的方法，將多份文件自動生成一篇摘要，以幫助使用者能降低資訊的閱讀時間，並能快速理解文件所欲表達之意涵。本研究以電影評論文章為例，結合文章結構的概念，將影評摘要分為「電影資訊」、「電影劇情介紹」及「心得結論」三部分，其中「電影資訊」及「心得結論」為透過本研究建置之電影領域相關詞庫比對得出。接著將餘下之段落歸屬於「電影劇情介紹」，並透過LDA主題模型將段落分群，再運用主題關係地圖的概念挑選各群之代表段落並排序，最後將各段落去除連接詞及將代名詞還原為其所指之主詞，以形成一篇列點式影評摘要。研究結果顯示，本研究所實驗之三部電影，產生之摘要能涵蓋較多的資訊內容，提升了摘要之多樣性，在與最佳範本摘要的相似度比對上，分別提升了10.8228%、14.0123%及25.8142%，可知本研究方法能有效掌握文件之重點內容，生成之摘要更為全面，藉由此方法讓使用者自動彙整電影評論文章，以生成一精簡之摘要，幫助使用者節省其在資訊的搜尋及閱讀的時間，以便能快速了解相關電影之資訊及評論。 / The rapid development of information technology over the past decades has dramatically increased the amount of online information. Because of the time-wasting on absorbing large amounts of information for users, we would like to present a method in this thesis by using text topic and relationship mining for multi-document summarization to help users grasp the theme of multiple documents quickly and easily by reading the accurate summary without reading the whole documents. We use movie reviews as an example of multi-document summarization and apply the concept of article structures to categorize summary into film data, film orientation and conclusion by comparing the thesaurus of movie review field built by this thesis. Then we cluster the paragraphs in the structure of film orientation into different topics by Latent Dirichlet Allocation (LDA). Next, we apply the concept of text relationship map, a network of paragraphs and the node in the network referring to a paragraph and an edge indicating that the corresponding paragraphs are related to each other, to extract the most important paragraph in each topic and order them. Finally, we remove conjunctions and replace pronouns with the name it indicates in each extracted paragraph s and generate a bullet-point summary. From the result, the summary produced by this thesis can cover different topics of contents and improve the diversity of the summary. The similarities compared with the produced summaries and the best-sample summaries raise of 10.8228%, 14.0123% and 25.8142% respectively. The method presented in this thesis grasps the key contents effectively and generates a comprehensive summary. By providing this method, we try to let users aggregate the movie reviews automatically and generate a simplified summary to help them reduce the time in searching and reading articles. 文字探勘多文件自動摘要 LDA主題模型主題關係地圖 Text mining Multi-document summarization LDA Topic Model Text relationship map

1

Page generated in 0.0437 seconds