Global ETD Search

1	應用情感分析於輿情之研究-以台灣2016總統選舉為例 / A Study of using sentiment analysis for emotion in Taiwan's presidential election of 2016 陳昭元, Chen, Chao-Yuan Unknown Date (has links) 從2014年九合一選舉到今年總統大選，網路在選戰的影響度越來越大，後選人可透過網路上之熱門討論議題即時掌握民眾需求。文字情感分析通常使用監督式或非監督式的方法來分析文件，監督式透過文件量化可達很高的正確率，但無法預期未知趨勢，耗費人力標注文章。本研究針對網路上之政治新聞輿情，提出一個混合非監督式與監督式學習的中文情感分析方法，先透過非監督式方法標注新聞，再用監督式方法建立分類模型，驗證分類準確率。在實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TFIDF矩陣過於稀疏，使得TFIDF-Kmeans主題模型分類效果不佳；而NPMI-Concor主題模型分類效果較佳但是所分出的議題詞數量不均衡，然而LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群與主題分類準確度都優於TFIDF-Kmeans和NPMI-Concor主題模型，分類準確度高達97%，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果，並且進一步使用ChineseWordnet 和 SentiWordNet，找出詞彙的情緒強度，使得在網友評論的情緒計算更加準確。亦發現所有文本的情緒指數皆具皆能反應民調指數，故本研究用文本的情緒指數來建立民調趨勢分類模型。在關注議題分類結果的實驗，整體正確率達到95%，而在民調趨勢分類結果的實驗，整體正確率達到85%。另外建立全面性的視覺化報告以瞭解民眾的正反意見，提供候選人在選戰上之競爭智慧。 / From Taiwanese local elections, 2014 to Taiwan presidential elections, 2016. Network is in growing influence of the election. The nominee can immediately grasp the needs of the people through a popular subject of discussion on the website. Sentiment Analysis research encompasses supervised and unsupervised methods for analyzing review text. The supervised learning is proved as a powerful method with high accuracy, but there are limits where future trend cannot be recognized, and the labels of individual classes must be made manually. In the study, we propose a Chinese Sentiment Analysis method which combined supervised and unsupervised learning. First, we used unsupervised learning to label every articles. Secondly, we used supervised learning to build classification model and verified the result. According to the result of finding subject labeling, we found that TFIDF-Kmeans model is not suitable because of document characteristic. NPMI-Concor model is better than TFIDF-Kmeans model. But the subject words is not balanced. However, LDA model has the feature that all subject is share by all articles. LDA model classification performance can reach 97% accuracy. So we choose it to decide article subject. According to the result of sentimental labeling, the sentimental dictionary we build has higher accuracy than NTUSD on judging word polarity. Moreover, we used ChineseWordnet and SentiWordNet to calculate the strength of word. So we can have more accuracy on calculate public’s sentiment. So we use these sentiment index to build prediction model. In the result of subject labeling, our accuracy is 95%. Meanwhile, In the result of prediction our accuracy is 85%. We also create the Visualization report for the nominee to understand the positive and the negative options of public. Our research can help the nominee by providing competitive wisdom. 情感分析文字分類支援向量機 Sentiment Analysis Text Classification SVM
2	應用情感分析於媒體新聞傾向之研究-以中央社為例 / Applying sentiment analysis to the tendency of media news: a case study of central news agency 吳信維, Wu, Xin-Wei Unknown Date (has links) 本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫，並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法，從中央通訊社中抓取國民黨以及民進黨的相關新聞文本，建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。　　本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘，於這些詞中具有真正意義的詞共有28838筆，成功率可達88%。　　本研究比較兩種分群方法建立主題模型，分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中，因為文本數量遠大於議題詞數量，造成TFIDF矩陣過於稀疏，造成分群效果不佳。在LDA的分群結果底下，因為LDA模型其多文章多主題共享的特性，主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本，採用LDA模型來進行議題詞分群會有較佳的表現。　　本研究透過結合不同的資料時間區間，呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候，而近三次總統大選的結果顯示，相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出，不論執政黨為何，中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容，大抵不會特別偏向單一政黨 / The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results. 　　This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%. 　　In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering. 　　Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party. 情感分析 LDA主題模型 n-gram a-priori Sentiment analysis LDA N-gram A-priori
3	股市趨勢預測之研究 -財經評論文本情感分析 / Predict the trend in the stock by Sentiment analyzing financial posts 蔡宇祥, Tsai, Yu Shiang Unknown Date (has links) 根據過去研究指出，社群網站上的貼文訊息會對群眾情緒造成影響，進而影響股市波動，故對於投資者而言，如果能快速分析大量社群網站的財經文本來推測投資情緒進而預測股市走勢，將可提升投資獲利。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別，故其有無法預期未知類別的限制，所以本研究透過深度學習方法，從巨量資料及裡抓出有關於股市之文章，並透過財經文本的混合監督式學習與非監督式學習之情感分析方法，透過非監督式學習對微博財經貼文進行文本主題判別、情緒指數計算與情緒傾向標注，並且透過監督式學習的方式，建立分類模型以預測上海指數走勢，最後配合視覺化工具作趨勢線圖分析，找出具有領先指標特性之主題。在實驗結果中，深度學習方面，本研究透過word2vec抓取有效之股市主題文章，有效篩選了需要分析之文本，主題模型方面，我們最後使用LDA作為本研究標註主題之方法，因為其文本數量大於議題詞數量造成TFIDF矩陣過於稀疏，造成Kmeans分群效果不佳，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，透過擴充後的情感詞集比起NTUSD有更好的詞性分數判斷效果，計算出的情緒指數之趨勢線能有效預測上海指數之趨勢。此外，並非所有主題模型之情緒指數皆具有領先特性，僅公司表現與上海指數之主題模型的情緒指數能提前反應上海指數趨勢，故本研究用此二主題之文本的情緒指數來建立分類模型。本研究透過比較情緒指數與單純指數指標分類模型的準確度，前者較後者高出7%的準確率。故證實了情感分析確實能有效提升上海指數趨勢預測準確度，幫助投資者增加股市報酬率。情感分析 Word2vec LDA主題模型 K-means 上海股價指數
4	口碑情感對於募資專案之影響 / The Influence of eWOM Sentiment on the Success of Crowdfunding Projects 林漢文 Unknown Date (has links) 「群眾募資」為社會大眾透過小額資金的贊助，發揮群體集結的力量，支持個人或組織使其目標或專案得以執行完成。隨著群眾募資平台的出現，加速了群眾募資的發展，從國外知名的Kickstarter 到國內的Flyingv，這股募資的旋風一路席捲了國內外傳統借貸生態。然而募資專案的成功因素也變成了一個重要的課題，過去關於募資專案的文獻大多提到募資金額、募資更新次數等因素，較少著墨於投資者對於募資產品的評論或口碑因素。因此本研究提出一個更廣泛的整合架構，針對網路評論做情感分析作為影響募資專案成功的重要因素之一，並對 Kickstarter 上的專案，進行實證研究，結果發現口碑的數量及情感因素在不同類別的專案中有不同的影響。在Game, Technology 和Design 類別對募資專案成功有顯著的影響，但是在Music, Theater 和Dance 專案則沒有顯著影響。 / Abstract Crowdfunding is definded as a process or activity that openly solicits a small amount of money from a group of persons or orgnizations to make it success. The appearance of crowdfunding platforms in recent years has accelerated the popularity of crowdfunding. From Kickstarter to Flyingv, this Crowdfunding trend has changed traditional borrowing ecology. However, not all crowdfunding projects are successful. A substantial amount of proposed projects failed due to unable to raise the target money. Therefore, it is interesting to investigate factors that may affect the success of a fundraising project. Previous literature has reported several success factors for crowdfunding, such as the target amount, the number of updates, and so on. However, not many studies have investigated the effect of project reviews in the past literature. It is clear that word of mouth plays an important role in consumer decision, and it is reasonable to believe that project reviews as a kind of word of mouth will have effect on investors’ decision. Hence, this study adopts the sentiment analysis technique to analyze how the sentiment of project reviews, along with other factors, may affect the eventual project success. The data collected from the Kickstarter.com was used to evaluate our research model. Our findings indicate that the number and sentiment of project reviews did have impact on fundraising success, but only in certain categories such as game, design and technology that seem to have objective evaluation criteria. Their effect was not significant in categories such as music, theater, and dance in which investors’ preference may be very subjective. 情感分析關鍵成功因素群眾募資 sentiment analysis success factor crowdfunding
5	應用情感分析於指數型證券投資信託基金趨勢預測之研究 / Research into sentimental analysis to predict exchange-traded fund trend 黃泓銘, Huang, Hung-Ming Unknown Date (has links) 近年來ETF規模快速成長，亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源，而元大台灣50指數型證券投資信託基金因規模大，受到投資人的青睞。根據過去的研究指出，網路上的文本訊息會對群眾情緒造成影響，進而影響股價波動，對投資者而言，若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢，勢必可提高報酬率。然而，每日有上百篇的財金文本產生，人工分析耗時耗力，本研究採用文字探勘技術，提出一套情感分析的價格預測模型。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，然而，為解決監督式學習無法預期未知的限制，本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別，計算情緒指數並標記文本情緒傾向，再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等，建立分類模型以預測元大台灣50ETF的價格趨勢。實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏，使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現，前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率，故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only. 情感分析 LDA主題模型支援向量機 ETF Sentimental analysis LDA SVM ETF
6	應用情感型態分析於指數股票型基金趨勢研究-以台灣卓越50基金為例 / A study on the trend of exchange traded funds by sentiment pattern analysis in Yuanta Taiwan Top 50 ETF 林詠翔, Lin, Yong-Xiang Unknown Date (has links) 根據研究指出 ETF 資產規模近幾年快速成長，元大台灣卓越 50 基金因市場規模大等優勢受到投資人的青睞，賴以巨量資料的發展使得文字探勘技術成熟，故本研究希冀提出一套情感分析的價格預測模型，提升投資者的報酬率。過往學者以文章中的單詞作為文字探勘的分析單位，常會產生同義詞、多義詞的問題，因此提出情感型態分析的監督式學習方法建立模型。另外為了解決監督式學習難以取得訓練資料的限制，本研究混合非監督式學習方法進行主題分群與情緒傾向標注。本研究建立台灣股市新聞文本資料集，並篩選熱門議題詞詞庫，進行非監督式的 LDA 主題模型，發現在 2016 年總統選舉期間，媒體對於公司相關議題的注意力降低，使得相關的文本數量大幅減少;另外在情緒傾向標注階段，因混和了 NTUSD、知網及自行擴充演算法的情感詞庫，能夠將 10%中性詞彙產生極性判斷、96%的文本標注情緒傾向。視覺化工具分析結果指出，DIF-MACD 能夠預測台灣卓越 50 基金的長期走勢，而新聞情緒指數則在短期的價格波動上表現良好，且在主題模型分群中，總體經濟、公司維運類別的新聞情緒指數具有約 1-2 日領先指標特性，對於後續的價格預測模型有所助益。在監督式情感分析方法，為解決上述同義詞、多義詞的問題，本研究採用型態分類模型於中文文本，並與向量空間模型、支援向量機等方法做比較。實驗結果指出優化的型態分類模型，並結合台灣加權股價指數，表現相對良好，F1- Measure 可達 85%。進一步討論新聞情緒對於價格預測的重要性，發現在非交易時間序列中的新聞情緒，能夠對 0050 的價格波動產生影響。 / The past research points out that the scale of ETF assets has been growing rapidly in recent years. Yuanta Taiwan Top 50 ETF is popular with investors because of the advantages of large market scale. Through the development of Big Data, the technology of Text Mining becomes mature. Thus, we analyze the price forecast model to raise the investors' rate of return. The research of Text Mining used to take the document term to analyze, but it often results in the problem with synonym and polysemy. Therefore, this research proposes a supervised learning method of sentiment pattern analysis. In addition, in order to solve the problem with training data about the supervised learning method, we mix the unsupervised learning method to carry out the subject grouping and sentimental tendency. In this study, we establish the news dataset and screen it as popular terms that are used to an unsupervised method of LDA model. The result points out that the number of news about company dropped significantly during the 2016 Taiwan president election because of the change of media sensation. Moreover, we create the sentiment dictionary that can determine the polarity of 10% neutral terms and the emotional tendency of 96% documents by mixing the NTUSD, HowNet knowledge Database and the self-expansion algorithm. Through the data visualization, the result shows that the curve of DIF-MACD is able to predict the long-term trend of 0050, while the sentiment index of the news makes a good showing in the short-term price volatility. Besides, the news sentiment index of the subjects that belong to general economy and company has about 1 to 2 day leading indicators. Eventually, we employ the Sentiment Pattern Taxonomy Model(PTM) in Chinese texts as supervised learning method and compare with VSM and SVM. The experiment result shows that PTM combined with Taiwan Weighted Stock Index is the best when its F1-Measure is up to 85%. Apart from this, we find that the sentiment index of the news in non-trading time can influence the price volatility of 0050. 情感分析 LDA主題模型型態模型指數股票型基金 Sentimental analysis LDA Pattern model ETF
7	網路評價搜尋結果的正負意見分類系統 / A sentiment classification system on search results of web opinions 黃泓彰, Huang, Hung Chang Unknown Date (has links) 本研究嘗試建置一個包含兩個主要功能的系統，分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份，我們使用Google搜尋並蒐集一攜帶型智慧裝置（智慧型手機、平板電腦與筆記型電腦）的網路評價搜尋結果；情感分類的部分則是將搜尋結果依照對該產品的意見分類為，共有正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面等四種分類方式。為了建置此系統，我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱，接著以人工的方式標記每篇文章，以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗，我們首先從語句層次出發，以監督式機器學習法訓練將句子分為正面／負面／中立等三個類別的分類模型後，再進入文章層次，將句子的意見彙整，並同樣以監督式機器學習法訓練四種不同文章層次的分類模型：正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面。我們分別選出四種分類實驗中表現最佳的模型，並用於系統建置，其中表現最佳的是分類為正面／負面的分類模型，平均的F-measure為0.87；其次是分類為負面／非負面的模型，對負面類別的F-measure為0.83；接著是分類為正面／非正面的模型，對正面類別的F-measure為0.81；表現最差的是正面／負面／中立的分類，平均的F-measure為0.77。在正面／負面分類的準確率上，本研究的表現並不壞於過去以英文為主要語言的相關研究。最後，我們也以過去不經過語句層次的分類方法進行實驗並比較，其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。 / In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification. 意見探勘情感分析情感分類網路評價 Opinion mining Sentiment analysis Sentiment classification Web opinion
8	兩種中文情感運算分析策略：以部首為基礎及深層類神經學習 / Two Chinese Sentiment Analysis Approaches: Radical-based and Deep Learning Neural Network 趙逢毅, Chao, August F.Y. Unknown Date (has links) 評論是所有人類行為的核心，因為它影響我們行為的關鍵因素。我們都試著從不同型式的評論分析與研究試著從作者字裡行間的文字呈現內容深入推敲及理解，從而要能過濾出能協助決策的有用資訊。在早期的評論研究將評論視為是文本分類問題，直到2000年前後，從分析評論的主觀句子與評論裡形容詞的程度衡量用詞，學者們開始對解構整篇文本的內容，並試著從語言學的角度分析用字遣詞與情感方向之間的關聯。這種從文字語義關聯分析評論的方式，也使文本挖掘技術必需結合自然語言的處理原則，才能更準確地了解評論的內容。隨著許多新興的機器學習演算法與自然語言處理方法不斷地推陳出新，及網路使用行為拓展至電子商務與線上虛擬社群的建立，情感分析研究亦開始不斷地蓬勃發展。漢文不同於世界其它語言，它擁有許多獨特表徵：無空格區隔、一字一語素、依詞為語言中表達意義的最小獨立單位，也使得在套用源自西方的情感分析原則時更加困難。然而過去的研究者則加以利用這些語言特徵，建立出專屬中文的情感分析原則。我們務實地討論適用於中文情感分析的情境(a)可取得情感分析資源及專家語言智慧，及(b)可取得領域字詞特徵向量定義的兩個前題下，提出適合的中文情感分析策略。在情境(a)中，我們深入討論運用部首資訊至情感分析中的適用性，並且提出一套能精萃出領域評論文本的觀測字詞/部首組的方法。研究中我們萃取出50個部首組，並運用在領域相近的評論裡得到很好的情感分類成效。而在情境(b)中我們提出適合深層類神經網路學習方法的評論字詞的權重過濾原則，不僅能確保評論字詞在學習過程中仍保有能積旋出合適屬性，並且驗證此權重原則在支援向量機的學習方式下亦有相同的優勢。在研究中，我們亦討論此兩種情境下進行情感分析的必要條件與資訊，並為未來更深入的中文情感分析起到墊腳石的作用。 / Opinion is the core of human behaviors, because it directly influences key factor of our behaviors. Despite of personal or organizational decision making processes, we all constantly conduct various kinds of opinion analysis, including explaining and comprehending what users present. At the beginning, opinion studies considered as a text mining problems, and tried to cluster opinions into positive and negative groups. After 2000, researchers intended to decompose sentences from whole opinions by analysing subjective expressing and adjective words presenting within, as well as explained the relationships between semantics and sentiment from linguistics aspect. Therefore, opinion analysis has to incorporate with natural language processing techniques, so we can understand the opinion contents. Nowadays, sentiment analysis grows event booming due to emerging machine learning and natural language processing approaches, as well as the needs of electronic commerce and virtual community on line. Unfortunately, Chinese is quite unlike other language due to non-space separated, one character as one morpheme, and considering words (compositing with several characters) as minimum semantic expression unit. And those language features also bring difficult to adopted sentiment analysis principles from English. Nevertheless, researchers leveraged Chinese language information to propose specific sentiment analysis approaches dedicated to analyze Chinese opinions. In this study, we practically discussed the situations of conducting sentiment analysis: (a) using sentiment analysis resources and experts’ knowledge; and (b) using word feature vector, called word2vec, and deep learning. In (a) scenario, we propose a Chinese radical-based sentiment analysis approach and experiment the applicability. We also proposed a feature extraction method, so we can generate 50 seeds for further analysis. In (b), we compared 4 different feature selection approaches for deep learning, in order to keep accuracy and make sure understandable feature can be generated in neural network. We also tested feature selection approaches in SVM classifier and retrieved similar results. In this study, we also discussed essential constraints and required information in both scenarios, as well as the results of this study can be the foundation of continuing Chinese sentiment analysis studies. 中文情感分析部首資訊深層學習屬性選擇屬性萃取 Chinese Sentiment Analysis Radical Information Deep Learning Feature Selection Feature Extraction
9	消費者輿情對跨境網購產品銷售量之影響：以淘寶網為例 / The Effects of Consumer Comments and Sentiments on Product Sales of Cross-border Shopping Websites: The Taobao Case 呂奕勳 Unknown Date (has links) 近年來傳統線上購物正面臨著一連串的市場困境，如削價競爭、廉價品競爭等，因此導致銷售量之成長趨緩，反觀跨境線上購物卻出現了蓬勃發展的態勢，因而讓跨境線上購物成為驅動經濟活動與國際貿易的新引擎。另一方面，由於跨境線上購物的情境複雜性遠高於傳統的境內線上購物，業者們欲開發一海外新市場，必須先了解該地消費者行為與其購買決策過程後，才能制定出好的商業策略，並且進一步將產品導向的服務轉化成為以顧客導向的服務，才有機會為傳統線上購物之困境另闢生機。因此，引取並了解消費者所體認的內在價值是經營跨境線上購物最重要的成功因素。本研究將試圖將傳統境內線上購物研究擴展到跨境線上購物議題，藉由文字探勘(Text Mining)分析、語意情感分析與 k-means 分群演算法，挖掘出消費者對於所購買商品之評論的常見內容型態與所購買商品之類別，並試圖找出跨境網購平台上各項因素及商品評論對於產品銷售量間之關連性，提供未來研究者及跨境網購平台業者決策之依據。 / While online shopping websites are facing the difficulties of price and low-quality competition, cross-border online shopping is on a vigorous development trend, showing that cross-border online shopping is an important trend of online shopping field. Due to the complexity of cross-border online shopping is much higher than the traditional domestic online shopping, so understanding the value of cross-border online shopping consumers is the most important success factors. Companies want to develop new markets abroad, must understand the local consumer’s behaviour and their decision-making process in order to make good business strategies. This study uses text mining analytic technology, semantic analysis techniques, and k-means clustering algorithm to identify characteristics of consumers’ reviews and the common categories of goods they purchased. After getting the reason why consumers use cross-border online shopping service and what values they got in this service. Researcher can predict and analyse the evolution and development of cross-border online shopping, provide reference for future online shopping academic studies and online shopping industry’s decision-making. 跨境線上購物行為線上評論分析文字探勘情感分析 Cross-Border Online Shopping Behaviour Online Review Analytic Text Mining Sentiment Analysis
10	運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究 / Research of applying Sentimental Analysis on financial documents to predict Taiwan Electronic Sub-Index trend 劉羿廷 Unknown Date (has links) 電子工業為台灣最具競爭力之產業,使得電子類股在集中市場成交比重高達 69.49%,可見電子類股的波動足以對整個台股市場造成相當大的影響。而許多研究指出,網路上的文本訊息藉由社會網路的催化而快速傳遞,會對群眾情緒造成影響,進而影響股價波動,故對於投資者而言,如果能快速分析大量網路財經文本來推測投資大眾情緒進而預測股價走勢,即可提升獲利。然而,每天有近百篇的財經文本產生,傳統的人工抽樣分析方式效率不彰且過於耗力, 已不足以負荷此巨量資料。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別,故其有無法預期未知類別的限制,造成無法判斷文本中可能存在的未知主題,所以本研究提出一套針對財經文本的混合監督式學習與非監督式學習之情感分析方法,透過非監督式學習將 2014 整年度的電子工業財經文本進行文本主題判別、情緒指數計算與情緒傾向標注。之後配合視覺化工具作趨勢線圖分析,找出具有領先指標特性之主題,接著再用監督式學習將其結合國際指標、總體經濟指標、台股指標、技術指標等,建立分類模型以預測台灣電子類股價指數走勢。在實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成 TFIDF 矩陣過於稀疏,使得 TFIDF-Kmeans 主題模型分類效果不佳;而文本具有多主題之特性造成 NPMI-Concor 分群之議題詞過於複雜不易歸納,然而LDA 主題模型基於所有主題被所有文章共享的特性,使得在字詞分群與主題分類準確度都優於 TFIDF-Kmeans 和 NPMI-Concor 主題模型,分類準確度高達 98%,故後續採用 LDA 主題模型進行主題標注。情緒傾向標注方面,證實本研究擴充後的情感詞集比起 NTUSD 有更好的字詞極性判斷效果,計算出的情緒指數之趨勢線也較投資人常用的 MACD 之趨勢線更符合電子類股價指數之趨勢。此外,亦發現並非所有文本的情緒指數皆具有領先特性,僅企業營運主題與總體經濟主題之文本的情緒指數能提前反應電子類股價指數趨勢,故本研究用此二主題之文本的情緒指數來建立分類模型。接著,本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出 7%的準確率。進一步結合間接情緒指標的分類模型更有高達 71%準確率,故證實了情感分析確實能有效提升電子股價類股指數趨勢預測準確度,以提升投資人之投資報酬率。 / The electronic industry is the most competitive industry in Taiwan, and its large volume could have strong influence on the whole stock market. Many research show that text documents on the Internet have great effect on public emotion, and the public emotion could also affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents then use this information to predict the stock trend. However, the traditional way to analyze text documents by human resource cannot afford the large volume of financial text documents on the Internet. In past Sentimental Analysis research, supervised method is proven as a method could reach high accuracy, but there are limits about predicting the future trend. This research found a solution which mixed supervised and unsupervised methods to deal with these large financial text documents. First, we use unsupervised method to find out the topic of documents, and then calculate the sentimental index to judge the document’s emotional direction. After that we will produce trend line charts by visualization tools to find out which theme documents’ sentiment index are leading indicators. Furthermore, we use supervised method to integrate the sentimental index with other 24 indirect sentimental index to build the prediction model. According to the result, we found that LDA model’s performance is better than TFIDF-Kmeans model and NPMI-Concor mode because of document characteristic. Besides, sentimental dictionary I build has higher accuracy than NTUSD on judging word polarity. The trend of sentimental index and Taiwan electronic sub-index(TE) to each other is more similar than MACD line and TE to each other. We also discover that the sentiment index produced from documents about enterprise operation and macroeconomics are leading indicators, so we use these to build prediction model. Moreover, we found that the prediction model which include the sentiment index better than which only include the technical indicators. As mentioned above, the sentimental index could make the prediction of Taiwan electronic sub-index trend be more accurate and promote the return of investment. 情感分析巨量資料 LDA 主題模型支援向量機電子類股價指數 Sentimental analysis Big Data LDA SVM Taiwan Electronic Sub-Index Trend

1	應用情感分析於輿情之研究-以台灣2016總統選舉為例 / A Study of using sentiment analysis for emotion in Taiwan's presidential election of 2016 陳昭元, Chen, Chao-Yuan Unknown Date (has links) 從2014年九合一選舉到今年總統大選，網路在選戰的影響度越來越大，後選人可透過網路上之熱門討論議題即時掌握民眾需求。文字情感分析通常使用監督式或非監督式的方法來分析文件，監督式透過文件量化可達很高的正確率，但無法預期未知趨勢，耗費人力標注文章。本研究針對網路上之政治新聞輿情，提出一個混合非監督式與監督式學習的中文情感分析方法，先透過非監督式方法標注新聞，再用監督式方法建立分類模型，驗證分類準確率。在實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TFIDF矩陣過於稀疏，使得TFIDF-Kmeans主題模型分類效果不佳；而NPMI-Concor主題模型分類效果較佳但是所分出的議題詞數量不均衡，然而LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群與主題分類準確度都優於TFIDF-Kmeans和NPMI-Concor主題模型，分類準確度高達97%，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果，並且進一步使用ChineseWordnet 和 SentiWordNet，找出詞彙的情緒強度，使得在網友評論的情緒計算更加準確。亦發現所有文本的情緒指數皆具皆能反應民調指數，故本研究用文本的情緒指數來建立民調趨勢分類模型。在關注議題分類結果的實驗，整體正確率達到95%，而在民調趨勢分類結果的實驗，整體正確率達到85%。另外建立全面性的視覺化報告以瞭解民眾的正反意見，提供候選人在選戰上之競爭智慧。 / From Taiwanese local elections, 2014 to Taiwan presidential elections, 2016. Network is in growing influence of the election. The nominee can immediately grasp the needs of the people through a popular subject of discussion on the website. Sentiment Analysis research encompasses supervised and unsupervised methods for analyzing review text. The supervised learning is proved as a powerful method with high accuracy, but there are limits where future trend cannot be recognized, and the labels of individual classes must be made manually. In the study, we propose a Chinese Sentiment Analysis method which combined supervised and unsupervised learning. First, we used unsupervised learning to label every articles. Secondly, we used supervised learning to build classification model and verified the result. According to the result of finding subject labeling, we found that TFIDF-Kmeans model is not suitable because of document characteristic. NPMI-Concor model is better than TFIDF-Kmeans model. But the subject words is not balanced. However, LDA model has the feature that all subject is share by all articles. LDA model classification performance can reach 97% accuracy. So we choose it to decide article subject. According to the result of sentimental labeling, the sentimental dictionary we build has higher accuracy than NTUSD on judging word polarity. Moreover, we used ChineseWordnet and SentiWordNet to calculate the strength of word. So we can have more accuracy on calculate public’s sentiment. So we use these sentiment index to build prediction model. In the result of subject labeling, our accuracy is 95%. Meanwhile, In the result of prediction our accuracy is 85%. We also create the Visualization report for the nominee to understand the positive and the negative options of public. Our research can help the nominee by providing competitive wisdom. 情感分析文字分類支援向量機 Sentiment Analysis Text Classification SVM
2	應用情感分析於媒體新聞傾向之研究-以中央社為例 / Applying sentiment analysis to the tendency of media news: a case study of central news agency 吳信維, Wu, Xin-Wei Unknown Date (has links) 本研究目的在於結合關聯規則新詞發掘演算法來擴增詞庫，並藉此提高結斷詞句的精確度以及透過非監督式情感分析方法，從中央通訊社中抓取國民黨以及民進黨的相關新聞文本，建立主題模型與情緒傾向的標注。再藉由監督式學習方法建立分類模型並驗證其成果。　　本研究藉由n-gram with a-priori algorithm來進行斷詞斷句的詞庫擴增。共有32007組詞被發掘，於這些詞中具有真正意義的詞共有28838筆，成功率可達88%。　　本研究比較兩種分群方法建立主題模型，分別為TFIDF-Kmeans以及LDA。在TFIDF-Kmeans分群結果中，因為文本數量遠大於議題詞數量，造成TFIDF矩陣過於稀疏，造成分群效果不佳。在LDA的分群結果底下，因為LDA模型其多文章多主題共享的特性，主題分類的精準度更高達八成以上。故本研究認為在分析具有多主題特性之文本，採用LDA模型來進行議題詞分群會有較佳的表現。　　本研究透過結合不同的資料時間區間，呈現出中央通訊社的新聞文本在我國近五次總統大選前後三個月間的新聞情緒傾向。同時探討各主題模型中各類別於大選前後三個月之情緒傾向變化。可以觀察到大致上文本的情感指數高峰值會出現於投票日的時候，而近三次總統大選的結果顯示，相關的政黨新聞情感值會於選舉過後趨於平緩。而從新聞文本的正負向情感統計以及以及整體情緒傾向分析可以看出，不論執政黨為何，中央通訊社的新聞對於國民黨以及民進黨皆呈現了正向且平穩的內容，大抵不會特別偏向單一政黨 / The purpose of this research is to combine association rules and new word mining algorithms to expand the lexicons so as to improve the accuracy of word segmentations, and by capturing the KMT and DPP news from the Central News Agency, it establishes the theme model and sentiment orientation through the unsupervised sentiment analysis method. Finally, by means of supervised learning methods, this research establishes classifications models and verifies its results. 　　This research uses n-gram with a-priori algorithm to segment words and sentences to expand the lexicons. A total of 32007 word are found, and among them, there have 28838 words with real meaning. The success rate is up to 88%. 　　In this research, we compare two different clustering methods to form the theme model, which are the TFIDF-Kmeans, and the LDA. From the results of TFIDF-Kmeans, the TFIDF matrix is too sparse, resulting in poor clustering because the number of texts is a lot larger than that of the issues. Unlike TFIDF-Kmeans, because of LDA model with more features of multi-topic sharing, the accuracy of topic classification is more than 80%. Therefore, this research suggests that it will have a better performance to analyze the multi-subjective texts with LDA model to classify the word clustering. 　　Through the combination of different data time interval, this research presents the sentimental tendencies of Central News Agency’s news in three months before and after the last five presidential elections in Taiwan. At the same time, it also explores the changes of the sentimental tendencies in the various theme models in the three months before and after the election. It can be observed the sentimental peak of the text will be appeared on the polling day, and nearly three times of the presidential election results show that the sentimental value of the relevant party’s news will become smooth after the election. From the positive and negative sentimental statistics of the news text and the analysis of the overall sentimental tendencies, no matter which the ruling party is, the news of the Central News Agency for the KMT and the DPP presents a positive and stable content, not particularly toward any political party. 情感分析 LDA主題模型 n-gram a-priori Sentiment analysis LDA N-gram A-priori
3	股市趨勢預測之研究 -財經評論文本情感分析 / Predict the trend in the stock by Sentiment analyzing financial posts 蔡宇祥, Tsai, Yu Shiang Unknown Date (has links) 根據過去研究指出，社群網站上的貼文訊息會對群眾情緒造成影響，進而影響股市波動，故對於投資者而言，如果能快速分析大量社群網站的財經文本來推測投資情緒進而預測股市走勢，將可提升投資獲利。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別，故其有無法預期未知類別的限制，所以本研究透過深度學習方法，從巨量資料及裡抓出有關於股市之文章，並透過財經文本的混合監督式學習與非監督式學習之情感分析方法，透過非監督式學習對微博財經貼文進行文本主題判別、情緒指數計算與情緒傾向標注，並且透過監督式學習的方式，建立分類模型以預測上海指數走勢，最後配合視覺化工具作趨勢線圖分析，找出具有領先指標特性之主題。在實驗結果中，深度學習方面，本研究透過word2vec抓取有效之股市主題文章，有效篩選了需要分析之文本，主題模型方面，我們最後使用LDA作為本研究標註主題之方法，因為其文本數量大於議題詞數量造成TFIDF矩陣過於稀疏，造成Kmeans分群效果不佳，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，透過擴充後的情感詞集比起NTUSD有更好的詞性分數判斷效果，計算出的情緒指數之趨勢線能有效預測上海指數之趨勢。此外，並非所有主題模型之情緒指數皆具有領先特性，僅公司表現與上海指數之主題模型的情緒指數能提前反應上海指數趨勢，故本研究用此二主題之文本的情緒指數來建立分類模型。本研究透過比較情緒指數與單純指數指標分類模型的準確度，前者較後者高出7%的準確率。故證實了情感分析確實能有效提升上海指數趨勢預測準確度，幫助投資者增加股市報酬率。情感分析 Word2vec LDA主題模型 K-means 上海股價指數
4	口碑情感對於募資專案之影響 / The Influence of eWOM Sentiment on the Success of Crowdfunding Projects 林漢文 Unknown Date (has links) 「群眾募資」為社會大眾透過小額資金的贊助，發揮群體集結的力量，支持個人或組織使其目標或專案得以執行完成。隨著群眾募資平台的出現，加速了群眾募資的發展，從國外知名的Kickstarter 到國內的Flyingv，這股募資的旋風一路席捲了國內外傳統借貸生態。然而募資專案的成功因素也變成了一個重要的課題，過去關於募資專案的文獻大多提到募資金額、募資更新次數等因素，較少著墨於投資者對於募資產品的評論或口碑因素。因此本研究提出一個更廣泛的整合架構，針對網路評論做情感分析作為影響募資專案成功的重要因素之一，並對 Kickstarter 上的專案，進行實證研究，結果發現口碑的數量及情感因素在不同類別的專案中有不同的影響。在Game, Technology 和Design 類別對募資專案成功有顯著的影響，但是在Music, Theater 和Dance 專案則沒有顯著影響。 / Abstract Crowdfunding is definded as a process or activity that openly solicits a small amount of money from a group of persons or orgnizations to make it success. The appearance of crowdfunding platforms in recent years has accelerated the popularity of crowdfunding. From Kickstarter to Flyingv, this Crowdfunding trend has changed traditional borrowing ecology. However, not all crowdfunding projects are successful. A substantial amount of proposed projects failed due to unable to raise the target money. Therefore, it is interesting to investigate factors that may affect the success of a fundraising project. Previous literature has reported several success factors for crowdfunding, such as the target amount, the number of updates, and so on. However, not many studies have investigated the effect of project reviews in the past literature. It is clear that word of mouth plays an important role in consumer decision, and it is reasonable to believe that project reviews as a kind of word of mouth will have effect on investors’ decision. Hence, this study adopts the sentiment analysis technique to analyze how the sentiment of project reviews, along with other factors, may affect the eventual project success. The data collected from the Kickstarter.com was used to evaluate our research model. Our findings indicate that the number and sentiment of project reviews did have impact on fundraising success, but only in certain categories such as game, design and technology that seem to have objective evaluation criteria. Their effect was not significant in categories such as music, theater, and dance in which investors’ preference may be very subjective. 情感分析關鍵成功因素群眾募資 sentiment analysis success factor crowdfunding
5	應用情感分析於指數型證券投資信託基金趨勢預測之研究 / Research into sentimental analysis to predict exchange-traded fund trend 黃泓銘, Huang, Hung-Ming Unknown Date (has links) 近年來ETF規模快速成長，亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源，而元大台灣50指數型證券投資信託基金因規模大，受到投資人的青睞。根據過去的研究指出，網路上的文本訊息會對群眾情緒造成影響，進而影響股價波動，對投資者而言，若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢，勢必可提高報酬率。然而，每日有上百篇的財金文本產生，人工分析耗時耗力，本研究採用文字探勘技術，提出一套情感分析的價格預測模型。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，然而，為解決監督式學習無法預期未知的限制，本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別，計算情緒指數並標記文本情緒傾向，再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等，建立分類模型以預測元大台灣50ETF的價格趨勢。實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏，使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現，前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率，故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only. 情感分析 LDA主題模型支援向量機 ETF Sentimental analysis LDA SVM ETF
6	應用情感型態分析於指數股票型基金趨勢研究-以台灣卓越50基金為例 / A study on the trend of exchange traded funds by sentiment pattern analysis in Yuanta Taiwan Top 50 ETF 林詠翔, Lin, Yong-Xiang Unknown Date (has links) 根據研究指出 ETF 資產規模近幾年快速成長，元大台灣卓越 50 基金因市場規模大等優勢受到投資人的青睞，賴以巨量資料的發展使得文字探勘技術成熟，故本研究希冀提出一套情感分析的價格預測模型，提升投資者的報酬率。過往學者以文章中的單詞作為文字探勘的分析單位，常會產生同義詞、多義詞的問題，因此提出情感型態分析的監督式學習方法建立模型。另外為了解決監督式學習難以取得訓練資料的限制，本研究混合非監督式學習方法進行主題分群與情緒傾向標注。本研究建立台灣股市新聞文本資料集，並篩選熱門議題詞詞庫，進行非監督式的 LDA 主題模型，發現在 2016 年總統選舉期間，媒體對於公司相關議題的注意力降低，使得相關的文本數量大幅減少;另外在情緒傾向標注階段，因混和了 NTUSD、知網及自行擴充演算法的情感詞庫，能夠將 10%中性詞彙產生極性判斷、96%的文本標注情緒傾向。視覺化工具分析結果指出，DIF-MACD 能夠預測台灣卓越 50 基金的長期走勢，而新聞情緒指數則在短期的價格波動上表現良好，且在主題模型分群中，總體經濟、公司維運類別的新聞情緒指數具有約 1-2 日領先指標特性，對於後續的價格預測模型有所助益。在監督式情感分析方法，為解決上述同義詞、多義詞的問題，本研究採用型態分類模型於中文文本，並與向量空間模型、支援向量機等方法做比較。實驗結果指出優化的型態分類模型，並結合台灣加權股價指數，表現相對良好，F1- Measure 可達 85%。進一步討論新聞情緒對於價格預測的重要性，發現在非交易時間序列中的新聞情緒，能夠對 0050 的價格波動產生影響。 / The past research points out that the scale of ETF assets has been growing rapidly in recent years. Yuanta Taiwan Top 50 ETF is popular with investors because of the advantages of large market scale. Through the development of Big Data, the technology of Text Mining becomes mature. Thus, we analyze the price forecast model to raise the investors' rate of return. The research of Text Mining used to take the document term to analyze, but it often results in the problem with synonym and polysemy. Therefore, this research proposes a supervised learning method of sentiment pattern analysis. In addition, in order to solve the problem with training data about the supervised learning method, we mix the unsupervised learning method to carry out the subject grouping and sentimental tendency. In this study, we establish the news dataset and screen it as popular terms that are used to an unsupervised method of LDA model. The result points out that the number of news about company dropped significantly during the 2016 Taiwan president election because of the change of media sensation. Moreover, we create the sentiment dictionary that can determine the polarity of 10% neutral terms and the emotional tendency of 96% documents by mixing the NTUSD, HowNet knowledge Database and the self-expansion algorithm. Through the data visualization, the result shows that the curve of DIF-MACD is able to predict the long-term trend of 0050, while the sentiment index of the news makes a good showing in the short-term price volatility. Besides, the news sentiment index of the subjects that belong to general economy and company has about 1 to 2 day leading indicators. Eventually, we employ the Sentiment Pattern Taxonomy Model(PTM) in Chinese texts as supervised learning method and compare with VSM and SVM. The experiment result shows that PTM combined with Taiwan Weighted Stock Index is the best when its F1-Measure is up to 85%. Apart from this, we find that the sentiment index of the news in non-trading time can influence the price volatility of 0050. 情感分析 LDA主題模型型態模型指數股票型基金 Sentimental analysis LDA Pattern model ETF
7	網路評價搜尋結果的正負意見分類系統 / A sentiment classification system on search results of web opinions 黃泓彰, Huang, Hung Chang Unknown Date (has links) 本研究嘗試建置一個包含兩個主要功能的系統，分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份，我們使用Google搜尋並蒐集一攜帶型智慧裝置（智慧型手機、平板電腦與筆記型電腦）的網路評價搜尋結果；情感分類的部分則是將搜尋結果依照對該產品的意見分類為，共有正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面等四種分類方式。為了建置此系統，我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱，接著以人工的方式標記每篇文章，以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗，我們首先從語句層次出發，以監督式機器學習法訓練將句子分為正面／負面／中立等三個類別的分類模型後，再進入文章層次，將句子的意見彙整，並同樣以監督式機器學習法訓練四種不同文章層次的分類模型：正面／負面／中立、正面／負面、正面／非正面，以及負面／非負面。我們分別選出四種分類實驗中表現最佳的模型，並用於系統建置，其中表現最佳的是分類為正面／負面的分類模型，平均的F-measure為0.87；其次是分類為負面／非負面的模型，對負面類別的F-measure為0.83；接著是分類為正面／非正面的模型，對正面類別的F-measure為0.81；表現最差的是正面／負面／中立的分類，平均的F-measure為0.77。在正面／負面分類的準確率上，本研究的表現並不壞於過去以英文為主要語言的相關研究。最後，我們也以過去不經過語句層次的分類方法進行實驗並比較，其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。 / In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification. 意見探勘情感分析情感分類網路評價 Opinion mining Sentiment analysis Sentiment classification Web opinion
8	兩種中文情感運算分析策略：以部首為基礎及深層類神經學習 / Two Chinese Sentiment Analysis Approaches: Radical-based and Deep Learning Neural Network 趙逢毅, Chao, August F.Y. Unknown Date (has links) 評論是所有人類行為的核心，因為它影響我們行為的關鍵因素。我們都試著從不同型式的評論分析與研究試著從作者字裡行間的文字呈現內容深入推敲及理解，從而要能過濾出能協助決策的有用資訊。在早期的評論研究將評論視為是文本分類問題，直到2000年前後，從分析評論的主觀句子與評論裡形容詞的程度衡量用詞，學者們開始對解構整篇文本的內容，並試著從語言學的角度分析用字遣詞與情感方向之間的關聯。這種從文字語義關聯分析評論的方式，也使文本挖掘技術必需結合自然語言的處理原則，才能更準確地了解評論的內容。隨著許多新興的機器學習演算法與自然語言處理方法不斷地推陳出新，及網路使用行為拓展至電子商務與線上虛擬社群的建立，情感分析研究亦開始不斷地蓬勃發展。漢文不同於世界其它語言，它擁有許多獨特表徵：無空格區隔、一字一語素、依詞為語言中表達意義的最小獨立單位，也使得在套用源自西方的情感分析原則時更加困難。然而過去的研究者則加以利用這些語言特徵，建立出專屬中文的情感分析原則。我們務實地討論適用於中文情感分析的情境(a)可取得情感分析資源及專家語言智慧，及(b)可取得領域字詞特徵向量定義的兩個前題下，提出適合的中文情感分析策略。在情境(a)中，我們深入討論運用部首資訊至情感分析中的適用性，並且提出一套能精萃出領域評論文本的觀測字詞/部首組的方法。研究中我們萃取出50個部首組，並運用在領域相近的評論裡得到很好的情感分類成效。而在情境(b)中我們提出適合深層類神經網路學習方法的評論字詞的權重過濾原則，不僅能確保評論字詞在學習過程中仍保有能積旋出合適屬性，並且驗證此權重原則在支援向量機的學習方式下亦有相同的優勢。在研究中，我們亦討論此兩種情境下進行情感分析的必要條件與資訊，並為未來更深入的中文情感分析起到墊腳石的作用。 / Opinion is the core of human behaviors, because it directly influences key factor of our behaviors. Despite of personal or organizational decision making processes, we all constantly conduct various kinds of opinion analysis, including explaining and comprehending what users present. At the beginning, opinion studies considered as a text mining problems, and tried to cluster opinions into positive and negative groups. After 2000, researchers intended to decompose sentences from whole opinions by analysing subjective expressing and adjective words presenting within, as well as explained the relationships between semantics and sentiment from linguistics aspect. Therefore, opinion analysis has to incorporate with natural language processing techniques, so we can understand the opinion contents. Nowadays, sentiment analysis grows event booming due to emerging machine learning and natural language processing approaches, as well as the needs of electronic commerce and virtual community on line. Unfortunately, Chinese is quite unlike other language due to non-space separated, one character as one morpheme, and considering words (compositing with several characters) as minimum semantic expression unit. And those language features also bring difficult to adopted sentiment analysis principles from English. Nevertheless, researchers leveraged Chinese language information to propose specific sentiment analysis approaches dedicated to analyze Chinese opinions. In this study, we practically discussed the situations of conducting sentiment analysis: (a) using sentiment analysis resources and experts’ knowledge; and (b) using word feature vector, called word2vec, and deep learning. In (a) scenario, we propose a Chinese radical-based sentiment analysis approach and experiment the applicability. We also proposed a feature extraction method, so we can generate 50 seeds for further analysis. In (b), we compared 4 different feature selection approaches for deep learning, in order to keep accuracy and make sure understandable feature can be generated in neural network. We also tested feature selection approaches in SVM classifier and retrieved similar results. In this study, we also discussed essential constraints and required information in both scenarios, as well as the results of this study can be the foundation of continuing Chinese sentiment analysis studies. 中文情感分析部首資訊深層學習屬性選擇屬性萃取 Chinese Sentiment Analysis Radical Information Deep Learning Feature Selection Feature Extraction
9	消費者輿情對跨境網購產品銷售量之影響：以淘寶網為例 / The Effects of Consumer Comments and Sentiments on Product Sales of Cross-border Shopping Websites: The Taobao Case 呂奕勳 Unknown Date (has links) 近年來傳統線上購物正面臨著一連串的市場困境，如削價競爭、廉價品競爭等，因此導致銷售量之成長趨緩，反觀跨境線上購物卻出現了蓬勃發展的態勢，因而讓跨境線上購物成為驅動經濟活動與國際貿易的新引擎。另一方面，由於跨境線上購物的情境複雜性遠高於傳統的境內線上購物，業者們欲開發一海外新市場，必須先了解該地消費者行為與其購買決策過程後，才能制定出好的商業策略，並且進一步將產品導向的服務轉化成為以顧客導向的服務，才有機會為傳統線上購物之困境另闢生機。因此，引取並了解消費者所體認的內在價值是經營跨境線上購物最重要的成功因素。本研究將試圖將傳統境內線上購物研究擴展到跨境線上購物議題，藉由文字探勘(Text Mining)分析、語意情感分析與 k-means 分群演算法，挖掘出消費者對於所購買商品之評論的常見內容型態與所購買商品之類別，並試圖找出跨境網購平台上各項因素及商品評論對於產品銷售量間之關連性，提供未來研究者及跨境網購平台業者決策之依據。 / While online shopping websites are facing the difficulties of price and low-quality competition, cross-border online shopping is on a vigorous development trend, showing that cross-border online shopping is an important trend of online shopping field. Due to the complexity of cross-border online shopping is much higher than the traditional domestic online shopping, so understanding the value of cross-border online shopping consumers is the most important success factors. Companies want to develop new markets abroad, must understand the local consumer’s behaviour and their decision-making process in order to make good business strategies. This study uses text mining analytic technology, semantic analysis techniques, and k-means clustering algorithm to identify characteristics of consumers’ reviews and the common categories of goods they purchased. After getting the reason why consumers use cross-border online shopping service and what values they got in this service. Researcher can predict and analyse the evolution and development of cross-border online shopping, provide reference for future online shopping academic studies and online shopping industry’s decision-making. 跨境線上購物行為線上評論分析文字探勘情感分析 Cross-Border Online Shopping Behaviour Online Review Analytic Text Mining Sentiment Analysis
10	運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究 / Research of applying Sentimental Analysis on financial documents to predict Taiwan Electronic Sub-Index trend 劉羿廷 Unknown Date (has links) 電子工業為台灣最具競爭力之產業,使得電子類股在集中市場成交比重高達 69.49%,可見電子類股的波動足以對整個台股市場造成相當大的影響。而許多研究指出,網路上的文本訊息藉由社會網路的催化而快速傳遞,會對群眾情緒造成影響,進而影響股價波動,故對於投資者而言,如果能快速分析大量網路財經文本來推測投資大眾情緒進而預測股價走勢,即可提升獲利。然而,每天有近百篇的財經文本產生,傳統的人工抽樣分析方式效率不彰且過於耗力, 已不足以負荷此巨量資料。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別,故其有無法預期未知類別的限制,造成無法判斷文本中可能存在的未知主題,所以本研究提出一套針對財經文本的混合監督式學習與非監督式學習之情感分析方法,透過非監督式學習將 2014 整年度的電子工業財經文本進行文本主題判別、情緒指數計算與情緒傾向標注。之後配合視覺化工具作趨勢線圖分析,找出具有領先指標特性之主題,接著再用監督式學習將其結合國際指標、總體經濟指標、台股指標、技術指標等,建立分類模型以預測台灣電子類股價指數走勢。在實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成 TFIDF 矩陣過於稀疏,使得 TFIDF-Kmeans 主題模型分類效果不佳;而文本具有多主題之特性造成 NPMI-Concor 分群之議題詞過於複雜不易歸納,然而LDA 主題模型基於所有主題被所有文章共享的特性,使得在字詞分群與主題分類準確度都優於 TFIDF-Kmeans 和 NPMI-Concor 主題模型,分類準確度高達 98%,故後續採用 LDA 主題模型進行主題標注。情緒傾向標注方面,證實本研究擴充後的情感詞集比起 NTUSD 有更好的字詞極性判斷效果,計算出的情緒指數之趨勢線也較投資人常用的 MACD 之趨勢線更符合電子類股價指數之趨勢。此外,亦發現並非所有文本的情緒指數皆具有領先特性,僅企業營運主題與總體經濟主題之文本的情緒指數能提前反應電子類股價指數趨勢,故本研究用此二主題之文本的情緒指數來建立分類模型。接著,本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出 7%的準確率。進一步結合間接情緒指標的分類模型更有高達 71%準確率,故證實了情感分析確實能有效提升電子股價類股指數趨勢預測準確度,以提升投資人之投資報酬率。 / The electronic industry is the most competitive industry in Taiwan, and its large volume could have strong influence on the whole stock market. Many research show that text documents on the Internet have great effect on public emotion, and the public emotion could also affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents then use this information to predict the stock trend. However, the traditional way to analyze text documents by human resource cannot afford the large volume of financial text documents on the Internet. In past Sentimental Analysis research, supervised method is proven as a method could reach high accuracy, but there are limits about predicting the future trend. This research found a solution which mixed supervised and unsupervised methods to deal with these large financial text documents. First, we use unsupervised method to find out the topic of documents, and then calculate the sentimental index to judge the document’s emotional direction. After that we will produce trend line charts by visualization tools to find out which theme documents’ sentiment index are leading indicators. Furthermore, we use supervised method to integrate the sentimental index with other 24 indirect sentimental index to build the prediction model. According to the result, we found that LDA model’s performance is better than TFIDF-Kmeans model and NPMI-Concor mode because of document characteristic. Besides, sentimental dictionary I build has higher accuracy than NTUSD on judging word polarity. The trend of sentimental index and Taiwan electronic sub-index(TE) to each other is more similar than MACD line and TE to each other. We also discover that the sentiment index produced from documents about enterprise operation and macroeconomics are leading indicators, so we use these to build prediction model. Moreover, we found that the prediction model which include the sentiment index better than which only include the technical indicators. As mentioned above, the sentimental index could make the prediction of Taiwan electronic sub-index trend be more accurate and promote the return of investment. 情感分析巨量資料 LDA 主題模型支援向量機電子類股價指數 Sentimental analysis Big Data LDA SVM Taiwan Electronic Sub-Index Trend

Search results