Global ETD Search

1	應用情感分析於輿情之研究-以台灣2016總統選舉為例 / A Study of using sentiment analysis for emotion in Taiwan's presidential election of 2016 陳昭元, Chen, Chao-Yuan Unknown Date (has links) 從2014年九合一選舉到今年總統大選，網路在選戰的影響度越來越大，後選人可透過網路上之熱門討論議題即時掌握民眾需求。文字情感分析通常使用監督式或非監督式的方法來分析文件，監督式透過文件量化可達很高的正確率，但無法預期未知趨勢，耗費人力標注文章。本研究針對網路上之政治新聞輿情，提出一個混合非監督式與監督式學習的中文情感分析方法，先透過非監督式方法標注新聞，再用監督式方法建立分類模型，驗證分類準確率。在實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TFIDF矩陣過於稀疏，使得TFIDF-Kmeans主題模型分類效果不佳；而NPMI-Concor主題模型分類效果較佳但是所分出的議題詞數量不均衡，然而LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群與主題分類準確度都優於TFIDF-Kmeans和NPMI-Concor主題模型，分類準確度高達97%，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果，並且進一步使用ChineseWordnet 和 SentiWordNet，找出詞彙的情緒強度，使得在網友評論的情緒計算更加準確。亦發現所有文本的情緒指數皆具皆能反應民調指數，故本研究用文本的情緒指數來建立民調趨勢分類模型。在關注議題分類結果的實驗，整體正確率達到95%，而在民調趨勢分類結果的實驗，整體正確率達到85%。另外建立全面性的視覺化報告以瞭解民眾的正反意見，提供候選人在選戰上之競爭智慧。 / From Taiwanese local elections, 2014 to Taiwan presidential elections, 2016. Network is in growing influence of the election. The nominee can immediately grasp the needs of the people through a popular subject of discussion on the website. Sentiment Analysis research encompasses supervised and unsupervised methods for analyzing review text. The supervised learning is proved as a powerful method with high accuracy, but there are limits where future trend cannot be recognized, and the labels of individual classes must be made manually. In the study, we propose a Chinese Sentiment Analysis method which combined supervised and unsupervised learning. First, we used unsupervised learning to label every articles. Secondly, we used supervised learning to build classification model and verified the result. According to the result of finding subject labeling, we found that TFIDF-Kmeans model is not suitable because of document characteristic. NPMI-Concor model is better than TFIDF-Kmeans model. But the subject words is not balanced. However, LDA model has the feature that all subject is share by all articles. LDA model classification performance can reach 97% accuracy. So we choose it to decide article subject. According to the result of sentimental labeling, the sentimental dictionary we build has higher accuracy than NTUSD on judging word polarity. Moreover, we used ChineseWordnet and SentiWordNet to calculate the strength of word. So we can have more accuracy on calculate public’s sentiment. So we use these sentiment index to build prediction model. In the result of subject labeling, our accuracy is 95%. Meanwhile, In the result of prediction our accuracy is 85%. We also create the Visualization report for the nominee to understand the positive and the negative options of public. Our research can help the nominee by providing competitive wisdom. 情感分析文字分類支援向量機 Sentiment Analysis Text Classification SVM
2	應用情感分析於指數型證券投資信託基金趨勢預測之研究 / Research into sentimental analysis to predict exchange-traded fund trend 黃泓銘, Huang, Hung-Ming Unknown Date (has links) 近年來ETF規模快速成長，亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源，而元大台灣50指數型證券投資信託基金因規模大，受到投資人的青睞。根據過去的研究指出，網路上的文本訊息會對群眾情緒造成影響，進而影響股價波動，對投資者而言，若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢，勢必可提高報酬率。然而，每日有上百篇的財金文本產生，人工分析耗時耗力，本研究採用文字探勘技術，提出一套情感分析的價格預測模型。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，然而，為解決監督式學習無法預期未知的限制，本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別，計算情緒指數並標記文本情緒傾向，再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等，建立分類模型以預測元大台灣50ETF的價格趨勢。實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏，使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現，前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率，故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only. 情感分析 LDA主題模型支援向量機 ETF Sentimental analysis LDA SVM ETF
3	運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究 / Research of applying Sentimental Analysis on financial documents to predict Taiwan Electronic Sub-Index trend 劉羿廷 Unknown Date (has links) 電子工業為台灣最具競爭力之產業,使得電子類股在集中市場成交比重高達 69.49%,可見電子類股的波動足以對整個台股市場造成相當大的影響。而許多研究指出,網路上的文本訊息藉由社會網路的催化而快速傳遞,會對群眾情緒造成影響,進而影響股價波動,故對於投資者而言,如果能快速分析大量網路財經文本來推測投資大眾情緒進而預測股價走勢,即可提升獲利。然而,每天有近百篇的財經文本產生,傳統的人工抽樣分析方式效率不彰且過於耗力, 已不足以負荷此巨量資料。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別,故其有無法預期未知類別的限制,造成無法判斷文本中可能存在的未知主題,所以本研究提出一套針對財經文本的混合監督式學習與非監督式學習之情感分析方法,透過非監督式學習將 2014 整年度的電子工業財經文本進行文本主題判別、情緒指數計算與情緒傾向標注。之後配合視覺化工具作趨勢線圖分析,找出具有領先指標特性之主題,接著再用監督式學習將其結合國際指標、總體經濟指標、台股指標、技術指標等,建立分類模型以預測台灣電子類股價指數走勢。在實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成 TFIDF 矩陣過於稀疏,使得 TFIDF-Kmeans 主題模型分類效果不佳;而文本具有多主題之特性造成 NPMI-Concor 分群之議題詞過於複雜不易歸納,然而LDA 主題模型基於所有主題被所有文章共享的特性,使得在字詞分群與主題分類準確度都優於 TFIDF-Kmeans 和 NPMI-Concor 主題模型,分類準確度高達 98%,故後續採用 LDA 主題模型進行主題標注。情緒傾向標注方面,證實本研究擴充後的情感詞集比起 NTUSD 有更好的字詞極性判斷效果,計算出的情緒指數之趨勢線也較投資人常用的 MACD 之趨勢線更符合電子類股價指數之趨勢。此外,亦發現並非所有文本的情緒指數皆具有領先特性,僅企業營運主題與總體經濟主題之文本的情緒指數能提前反應電子類股價指數趨勢,故本研究用此二主題之文本的情緒指數來建立分類模型。接著,本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出 7%的準確率。進一步結合間接情緒指標的分類模型更有高達 71%準確率,故證實了情感分析確實能有效提升電子股價類股指數趨勢預測準確度,以提升投資人之投資報酬率。 / The electronic industry is the most competitive industry in Taiwan, and its large volume could have strong influence on the whole stock market. Many research show that text documents on the Internet have great effect on public emotion, and the public emotion could also affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents then use this information to predict the stock trend. However, the traditional way to analyze text documents by human resource cannot afford the large volume of financial text documents on the Internet. In past Sentimental Analysis research, supervised method is proven as a method could reach high accuracy, but there are limits about predicting the future trend. This research found a solution which mixed supervised and unsupervised methods to deal with these large financial text documents. First, we use unsupervised method to find out the topic of documents, and then calculate the sentimental index to judge the document’s emotional direction. After that we will produce trend line charts by visualization tools to find out which theme documents’ sentiment index are leading indicators. Furthermore, we use supervised method to integrate the sentimental index with other 24 indirect sentimental index to build the prediction model. According to the result, we found that LDA model’s performance is better than TFIDF-Kmeans model and NPMI-Concor mode because of document characteristic. Besides, sentimental dictionary I build has higher accuracy than NTUSD on judging word polarity. The trend of sentimental index and Taiwan electronic sub-index(TE) to each other is more similar than MACD line and TE to each other. We also discover that the sentiment index produced from documents about enterprise operation and macroeconomics are leading indicators, so we use these to build prediction model. Moreover, we found that the prediction model which include the sentiment index better than which only include the technical indicators. As mentioned above, the sentimental index could make the prediction of Taiwan electronic sub-index trend be more accurate and promote the return of investment. 情感分析巨量資料 LDA 主題模型支援向量機電子類股價指數 Sentimental analysis Big Data LDA SVM Taiwan Electronic Sub-Index Trend
4	使用AUC特徵選取方法在蛋白質質譜儀資料分類之應用 / An AUC criterion for feature selection on classifying proteomic spectra data 葉勝宗 Unknown Date (has links) 表面增強雷射脫附遊離/飛行時間質譜(SELDI-TOF-MS)是種屬於高維度的蛋白質質譜儀資料，主要是用來偵測蛋白質分子的表現。由於SELDI技術的限制，導致掃描出來的質譜儀資料往往存在誤差與雜訊，因此在分析前通常會先針對原始資料進行低階的事前處理，步驟包括去除基線、正規化、峰偵測(peak detection)與峰調準(peak alignment)。本文中所探討前列腺癌資料，可分成正常、良性腫瘤、癌症初期與癌症末期四種類別。我們分析及比較兩筆事前處理的蛋白質質譜資料，包括我們自行處理的以及Adam等人所處理的資料。為了解決SELDI在偵測分子質量時常出現的位移誤差以及同位素的問題，我們提出以”質荷比段落”當作新的特徵變數的想法來進行分析。本文利用「ROC曲線下面積」(AUC)當作選取的準則來挑選出重要的質荷比段落，而分類方法則採用支援向量機(SVM)。在四分類的分類結果中，我們自行處理的事前處理資可以得到訓練資料89%及測試資料63 %的正確率。而Adam等人所處理的事前處理資料，則得到訓練資料94%及測試資料86 %的正確率。本研究結果指出不同事前處理的方法對分類結果確實有影響，同時也驗證了利用”特徵變數段落”的方法來進行分析的可行性。 / The surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a technique for presenting the expression of molecular masses. It is obvious that every spectrum has a huge dimension of features. In order to analyze these types of spectra samples, preprocessing steps are necessary. The steps of preprocessing include baseline subtraction, normalization, peak detection, and alignment. In our study, we use a prostate cancer data for demonstration. This prostate cancer data can be classified into four categories, namely, healthy men, benign prostate hyperplasia, early stage prostate cancer, and late stage prostate cancer. We analyzed both the preprocessed data processed by ourselves and the preprocessed data done by Adam et al.. In this thesis, we use segmentations of features as “new features” in attempt to solve problems due to location shifts and isotopes. The selection of important segmentations was based on the values of AUC and the SVM was applied for classification. For four-class classification, 94 % and 86 % of accuracy were obtained for training samples and validation samples, respectively, by using Dr. Adam et al.’s preprocessed data, and 89% for training samples, and 63% for validation samples by using our preprocessed data. This study suggested that the preprocessed method does have effect on classification result and a reasonable classification result can be obtained by using segmentations of features. 特徵選取分類 ROC曲線下面積支援向量機 AUC feature selection classification segmentation SELDI SVM
5	兩階段特徵選取法在蛋白質質譜儀資料之應用 / A Two-Stage Approach of Feature Selection on Proteomic Spectra Data 王健源, Wang,Chien-yuan Unknown Date (has links) 藉由「早期發現，早期治療」的方式，我們可以降低癌症的死亡率。因此找出與癌症病變有關的生物標記以期及早發現與治療是一項重要的工作。本研究分析了包含正常人以及攝護腺癌症病人實際的蛋白質質譜資料，而這些蛋白質質譜資料是來自於表面強化雷射解吸電離飛行質譜技術（SELDI-TOF MS）的蛋白質晶片實驗。表面增強雷射脫附遊離飛行時間質譜技術可有效地留存生物樣本的蛋白質特徵。如果沒有經過適當的事前處理步驟以消除實驗雜訊，ㄧ個質譜中可能包含多於數百或數千的特徵變數。為了加速對於可能的蛋白質生物標記的搜尋，我們只考慮可以區分癌症病人與正常人的特徵變數。基因演算法是一種類似生物基因演化的總體最佳化搜尋機制，它可以有效地在高維度空間中去尋找可能的最佳解。本研究中，我們利用仿基因演算法(GAL)進行蛋白質的特徵選取以區分癌症病人與正常人。另外，我們提出兩種兩階段仿基因演算法(TSGAL)，以嘗試改善仿基因演算法的缺點。 / Early detection and diagnosis can effectively reduce the mortality of cancer. The discovery of biomarkers for the early detection and diagnosis of cancer is thus an important task. In this study, a real proteomic spectra data set of prostate cancer patients and normal patients was analyzed. The data were collected from a Surface-Enhanced Laser Desorption/Ionization Time-Of-Flight Mass Spectrometry (SELDI-TOF MS) experiment. The SELDI-TOF MS technology captures protein features in a biological sample. Without suitable pre-processing steps to remove experimental noise, a mass spectrum could consists of more than hundreds or thousands of peaks. To narrow down the search for possible protein biomarkers, only those features that can distinguish between cancer and normal patients are selected. Genetic Algorithm (GA) is a global optimization procedure that uses an analogy of the genetic evolution of biological organisms. It’s shown that GA is effective in searching complex high-dimensional space. In this study, we consider GA-Like algorithm (GAL) for feature selection on proteomic spectra data in classifying prostate cancer patients from normal patients. In addition, we propose two types of Two-Stage GAL algorithm (TSGAL) to improve the GAL. 特徵選取基因演算法支援向量機 Feature Selection Genetic Algorithm (GA) SELDI Support Vector Machines (SVM)
6	應用探勘技術於社會輿情以預測捷運週邊房地產市場之研究 / A Study of Applying Public Opinion Mining to Predict the Housing Market Near the Taipei MRT Stations 吳佳芸, Wu, Chia Yun Unknown Date (has links) 因網際網路帶來的便利性與即時性，網路新聞成為社會大眾吸收與傳遞新聞資訊的重要管道之一，而累積的巨量新聞亦可反映出社會輿論對某特定新聞議題之即時反應、熱門程度以及情緒走向等。因此，本研究期望借由意見探勘與情緒分析技術，從特定領域新聞中挖掘出有價值的關聯，並結合傳統機器學習建立一個房地產市場的預測模式，提供購屋決策的參考依據。本研究搜集99年1月1日至103年6月30日共1,1150筆房地產新聞，以及8,165件捷運週邊250公尺內房屋買賣交易資料，運用意見探勘萃取意見詞彙進行情緒分析，並建立房市情緒與成交價量時間序列，透過半年移動平均、二次移動平均及成長斜率，瞭解社會輿情對房市行情抱持樂觀或悲觀，分析社會情緒與實際房地產成交間關聯性，以期能找出房地產買賣時機點，並進一步結合情緒及房地產的環境影響因素，藉由支援向量機建立站點房市的預測模型。實證結果中，本研究發現房市情緒與成交價量之波動有一定的週期與相關性，且新捷運開通前一年將連帶影響整體捷運房市波動，當成交線穿越情緒線且斜率同時向上時，可做為適當的房市進場時機點。而本研究針對站點情緒與環境變數所建立之預測模型，其預測新捷運線站點之平均準確率為69.2％，而預測新捷運線熱門站點之準確率為78％，顯示模型於預測熱門站點上具有不錯的預測能力。 / Nowadays, E-News have become an important way for people to get daily information. These enormous amounts of news could reflect public opinions on a particular attention or sentiment trends in news topics. Therefore, how to use opinion mining and sentiment analysis technology to dig out valuable information from particular news becomes the latest issue. In this study, we collected 1,1150 house news and 8,165 house transaction records around the MRT stations within 250 meters over the last five years. We extracted the emotion words from the news by manipulating opinion mining. Furthermore, we built moving average lines and the slope of the moving average in order to explore the relationship and entry point between public opinion and housing market. In conclusion, we indicated that there is a high correlation between the news sentiment and housing market. We also uses SVM algorithm to construct a model to predict housing hotspots. The results demonstrate that the SVM model reaches average accuracy at 69.2% and the model accuracy increases up to 78% for predicting housing hotspots. Besides, we also provide investors with a basis of entry point into the housing market by utilizing the moving average cross overs and slopes analysis and a better way of predicting housing hotspots. 文字探勘情緒探勘房地產移動平均支援向量機 Text Mining Opinion Mining Housing Market Moving Average Support Vector Machine
7	對使用者評論之情感分析研究－以Google Play市集為例 / Research into App user opinions with Sentimental Analysis on the Google Play market 林育龍, Lin, Yu Long Unknown Date (has links) 全球智慧型手機的出貨量持續提升，且熱門市集的App下載次數紛紛突破500億次。而在iOS和Android手機App市集中，App的評價和評論對App在市集的排序有很大的影響；對於App開發者而言，透過評論確實可掌握使用者的需求，並在產生抱怨前能快速反應避免危機。然而，每日多達上百篇的評論，透過人力逐篇查看，不止耗費時間，更無法整合性的瞭解使用者的需求與問題。文字情感分析通常會使用監督式或非監督式的方法分析文字評論，其中監督式方法被證實透過簡單的文件量化方法就可達到很高的正確率。但監督式方法有無法預期未知趨勢的限制，且需要進行耗費人力的文章類別標注工作。本研究透過情感傾向和熱門關注議題兩個面向來分析App評論，提出一個混合非監督式與監督式的中文情感分析方法。我們先透過非監督式方法標注評論類別，並作視覺化整理呈現，最後再用監督式方法建立分類模型，並驗證其效果。在實驗結果中，利用中文詞彙網路所建立的情感詞集，確實可用來判斷評論的正反情緒，唯判斷負面評論效果不佳需作改善。在議題擷取方面，嘗試使用兩種不同分群方法，其中使用NPMI衡量字詞間關係強度，再配合社群網路分析的Concor方法結果有不錯的成效。最後在使用監督式學習的分類結果中，情感傾向的分類正確率達到87%，關注議題的分類正確率達到96%，皆有不錯表現。本研究利用中文詞彙網路與社會網路分析，來發展一個非監督式的中文類別判斷方法，並建立一個中文情感分析的範例。另外透過建立全面性的視覺化報告來瞭解使用者的正反回饋意見，並可透過分類模型來掌握新評論的內容，以提供App開發者在市場上之競爭智慧。 / While the number of smartphone shipment is continuesly growing, the number of App downloads from the popular app markets has been already over 50 billion. By Apple App Store and Google Play, ratings and reviews play a more important role in influencing app difusion. While app developers can realize users’ needs by app reviews, more than thousands of reviews produced by user everday become difficult to be read and collated. Sentiment Analysis researchs encompass supervised and unsupervised methods for analyzing review text. The supervised learning is proven as a useful method and can reach high accuracy, but there are limits where future trend can not be recognized and the labels of individual classes must be made manually. We concentrate on two issues, viz Sentiment Orientation and Popular Topic, to propose a Chinese Sentiment Analysis method which combines supervised and unsupervised learning. At First, we use unsupervised learning to label every review articles and produce visualized reports. Secondly, we employee supervised learning to build classification model and verify the result. In the experiment, the Chinese WordNet is used to build sentiment lexicon to determin review’s sentiment orientation, but the result shows it is weak to find out negative review opinions. In the Topic Extraction phase, we apply two clustering methods to extract Popular Topic classes and its result is excellent by using of NPMI Model with Social Network Analysis Method i.e. Concor. In the supervised learning phase, the accuracy of Sentiment Orientation class is 87% and the accuracy of Popular Topic class is 96%. In this research, we conduct an exemplification of the unsupervised method by means of Chinese WorkNet and Social Network Analysis to determin the review classes. Also, we build a comprehensive visualized report to realize users’ feedbacks and utilize classification to explore new comments. Last but not least, with Chinese Sentiment Analysis of this research, and the competitive intelligence in App market can be provided to the App develops. 情感分析文字分類支援向量機社會網路分析對應分析 Sentiment Analysis Text Classification Support Vector Machine Social Network Analysis Correspondence Analysis
8	對於高維度資料進行特徵選取-應用於分類蛋白質質譜儀資料黃仁澤 Unknown Date (has links) 傳統的腫瘤指標篩檢方法，往往靈敏度、普及度及特異性有限，無法得到正確、即時的診斷結果。現今癌症的研究，則透過蛋白質體學經由光譜及影像觀察癌症不同時期的蛋白質表現變化，期望未來得以發展較佳之診斷工具。本研究中主要針對兩組攝護腺癌症病人之蛋白質質譜資料，此資料應用蛋白質晶片與表面強化雷射解吸電離飛行質譜技術（SELDI-TOF-MS）收集而來。我們的研究目的在於從大量的蛋白質特徵中篩選出一群有助於分類的蛋白質特徵變數。我們提出以最小分錯率特徵選取法與最小p值（檢定、Kruskal-Wallis檢定）特徵選取法進行初步特徵辨識度排序以及選取，並進一步發展出k-mean萃取法、最大相關係數萃取法與判定係數萃取法以改善變數間嚴重的共線性問題。我們利用支援向量機（Support Vector Machine）方法進行分類並評估分類效果，在不同的分類目的下萃取有助於辨識的蛋白質特徵，以決定最佳特徵集合。研究發現運用最小分錯率特徵選取法與最小p值分錯率特徵選取法，輔以判定係數萃取法，在各分類目的下皆有良好表現，為較佳的特徵選取方式。蛋白質體學蛋白質質譜特徵選取支援向量機

1	應用情感分析於輿情之研究-以台灣2016總統選舉為例 / A Study of using sentiment analysis for emotion in Taiwan's presidential election of 2016 陳昭元, Chen, Chao-Yuan Unknown Date (has links) 從2014年九合一選舉到今年總統大選，網路在選戰的影響度越來越大，後選人可透過網路上之熱門討論議題即時掌握民眾需求。文字情感分析通常使用監督式或非監督式的方法來分析文件，監督式透過文件量化可達很高的正確率，但無法預期未知趨勢，耗費人力標注文章。本研究針對網路上之政治新聞輿情，提出一個混合非監督式與監督式學習的中文情感分析方法，先透過非監督式方法標注新聞，再用監督式方法建立分類模型，驗證分類準確率。在實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TFIDF矩陣過於稀疏，使得TFIDF-Kmeans主題模型分類效果不佳；而NPMI-Concor主題模型分類效果較佳但是所分出的議題詞數量不均衡，然而LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群與主題分類準確度都優於TFIDF-Kmeans和NPMI-Concor主題模型，分類準確度高達97%，故後續採用LDA主題模型進行主題標注。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果，並且進一步使用ChineseWordnet 和 SentiWordNet，找出詞彙的情緒強度，使得在網友評論的情緒計算更加準確。亦發現所有文本的情緒指數皆具皆能反應民調指數，故本研究用文本的情緒指數來建立民調趨勢分類模型。在關注議題分類結果的實驗，整體正確率達到95%，而在民調趨勢分類結果的實驗，整體正確率達到85%。另外建立全面性的視覺化報告以瞭解民眾的正反意見，提供候選人在選戰上之競爭智慧。 / From Taiwanese local elections, 2014 to Taiwan presidential elections, 2016. Network is in growing influence of the election. The nominee can immediately grasp the needs of the people through a popular subject of discussion on the website. Sentiment Analysis research encompasses supervised and unsupervised methods for analyzing review text. The supervised learning is proved as a powerful method with high accuracy, but there are limits where future trend cannot be recognized, and the labels of individual classes must be made manually. In the study, we propose a Chinese Sentiment Analysis method which combined supervised and unsupervised learning. First, we used unsupervised learning to label every articles. Secondly, we used supervised learning to build classification model and verified the result. According to the result of finding subject labeling, we found that TFIDF-Kmeans model is not suitable because of document characteristic. NPMI-Concor model is better than TFIDF-Kmeans model. But the subject words is not balanced. However, LDA model has the feature that all subject is share by all articles. LDA model classification performance can reach 97% accuracy. So we choose it to decide article subject. According to the result of sentimental labeling, the sentimental dictionary we build has higher accuracy than NTUSD on judging word polarity. Moreover, we used ChineseWordnet and SentiWordNet to calculate the strength of word. So we can have more accuracy on calculate public’s sentiment. So we use these sentiment index to build prediction model. In the result of subject labeling, our accuracy is 95%. Meanwhile, In the result of prediction our accuracy is 85%. We also create the Visualization report for the nominee to understand the positive and the negative options of public. Our research can help the nominee by providing competitive wisdom. 情感分析文字分類支援向量機 Sentiment Analysis Text Classification SVM
2	應用情感分析於指數型證券投資信託基金趨勢預測之研究 / Research into sentimental analysis to predict exchange-traded fund trend 黃泓銘, Huang, Hung-Ming Unknown Date (has links) 近年來ETF規模快速成長，亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源，而元大台灣50指數型證券投資信託基金因規模大，受到投資人的青睞。根據過去的研究指出，網路上的文本訊息會對群眾情緒造成影響，進而影響股價波動，對投資者而言，若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢，勢必可提高報酬率。然而，每日有上百篇的財金文本產生，人工分析耗時耗力，本研究採用文字探勘技術，提出一套情感分析的價格預測模型。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果，然而，為解決監督式學習無法預期未知的限制，本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別，計算情緒指數並標記文本情緒傾向，再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等，建立分類模型以預測元大台灣50ETF的價格趨勢。實驗結果中，主題標注方面，本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏，使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性，使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面，證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現，前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率，故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only. 情感分析 LDA主題模型支援向量機 ETF Sentimental analysis LDA SVM ETF
3	運用財經文本情感分析於台灣電子類股價指數趨勢預測之研究 / Research of applying Sentimental Analysis on financial documents to predict Taiwan Electronic Sub-Index trend 劉羿廷 Unknown Date (has links) 電子工業為台灣最具競爭力之產業,使得電子類股在集中市場成交比重高達 69.49%,可見電子類股的波動足以對整個台股市場造成相當大的影響。而許多研究指出,網路上的文本訊息藉由社會網路的催化而快速傳遞,會對群眾情緒造成影響,進而影響股價波動,故對於投資者而言,如果能快速分析大量網路財經文本來推測投資大眾情緒進而預測股價走勢,即可提升獲利。然而,每天有近百篇的財經文本產生,傳統的人工抽樣分析方式效率不彰且過於耗力, 已不足以負荷此巨量資料。過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別,故其有無法預期未知類別的限制,造成無法判斷文本中可能存在的未知主題,所以本研究提出一套針對財經文本的混合監督式學習與非監督式學習之情感分析方法,透過非監督式學習將 2014 整年度的電子工業財經文本進行文本主題判別、情緒指數計算與情緒傾向標注。之後配合視覺化工具作趨勢線圖分析,找出具有領先指標特性之主題,接著再用監督式學習將其結合國際指標、總體經濟指標、台股指標、技術指標等,建立分類模型以預測台灣電子類股價指數走勢。在實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成 TFIDF 矩陣過於稀疏,使得 TFIDF-Kmeans 主題模型分類效果不佳;而文本具有多主題之特性造成 NPMI-Concor 分群之議題詞過於複雜不易歸納,然而LDA 主題模型基於所有主題被所有文章共享的特性,使得在字詞分群與主題分類準確度都優於 TFIDF-Kmeans 和 NPMI-Concor 主題模型,分類準確度高達 98%,故後續採用 LDA 主題模型進行主題標注。情緒傾向標注方面,證實本研究擴充後的情感詞集比起 NTUSD 有更好的字詞極性判斷效果,計算出的情緒指數之趨勢線也較投資人常用的 MACD 之趨勢線更符合電子類股價指數之趨勢。此外,亦發現並非所有文本的情緒指數皆具有領先特性,僅企業營運主題與總體經濟主題之文本的情緒指數能提前反應電子類股價指數趨勢,故本研究用此二主題之文本的情緒指數來建立分類模型。接著,本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出 7%的準確率。進一步結合間接情緒指標的分類模型更有高達 71%準確率,故證實了情感分析確實能有效提升電子股價類股指數趨勢預測準確度,以提升投資人之投資報酬率。 / The electronic industry is the most competitive industry in Taiwan, and its large volume could have strong influence on the whole stock market. Many research show that text documents on the Internet have great effect on public emotion, and the public emotion could also affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents then use this information to predict the stock trend. However, the traditional way to analyze text documents by human resource cannot afford the large volume of financial text documents on the Internet. In past Sentimental Analysis research, supervised method is proven as a method could reach high accuracy, but there are limits about predicting the future trend. This research found a solution which mixed supervised and unsupervised methods to deal with these large financial text documents. First, we use unsupervised method to find out the topic of documents, and then calculate the sentimental index to judge the document’s emotional direction. After that we will produce trend line charts by visualization tools to find out which theme documents’ sentiment index are leading indicators. Furthermore, we use supervised method to integrate the sentimental index with other 24 indirect sentimental index to build the prediction model. According to the result, we found that LDA model’s performance is better than TFIDF-Kmeans model and NPMI-Concor mode because of document characteristic. Besides, sentimental dictionary I build has higher accuracy than NTUSD on judging word polarity. The trend of sentimental index and Taiwan electronic sub-index(TE) to each other is more similar than MACD line and TE to each other. We also discover that the sentiment index produced from documents about enterprise operation and macroeconomics are leading indicators, so we use these to build prediction model. Moreover, we found that the prediction model which include the sentiment index better than which only include the technical indicators. As mentioned above, the sentimental index could make the prediction of Taiwan electronic sub-index trend be more accurate and promote the return of investment. 情感分析巨量資料 LDA 主題模型支援向量機電子類股價指數 Sentimental analysis Big Data LDA SVM Taiwan Electronic Sub-Index Trend
4	使用AUC特徵選取方法在蛋白質質譜儀資料分類之應用 / An AUC criterion for feature selection on classifying proteomic spectra data 葉勝宗 Unknown Date (has links) 表面增強雷射脫附遊離/飛行時間質譜(SELDI-TOF-MS)是種屬於高維度的蛋白質質譜儀資料，主要是用來偵測蛋白質分子的表現。由於SELDI技術的限制，導致掃描出來的質譜儀資料往往存在誤差與雜訊，因此在分析前通常會先針對原始資料進行低階的事前處理，步驟包括去除基線、正規化、峰偵測(peak detection)與峰調準(peak alignment)。本文中所探討前列腺癌資料，可分成正常、良性腫瘤、癌症初期與癌症末期四種類別。我們分析及比較兩筆事前處理的蛋白質質譜資料，包括我們自行處理的以及Adam等人所處理的資料。為了解決SELDI在偵測分子質量時常出現的位移誤差以及同位素的問題，我們提出以”質荷比段落”當作新的特徵變數的想法來進行分析。本文利用「ROC曲線下面積」(AUC)當作選取的準則來挑選出重要的質荷比段落，而分類方法則採用支援向量機(SVM)。在四分類的分類結果中，我們自行處理的事前處理資可以得到訓練資料89%及測試資料63 %的正確率。而Adam等人所處理的事前處理資料，則得到訓練資料94%及測試資料86 %的正確率。本研究結果指出不同事前處理的方法對分類結果確實有影響，同時也驗證了利用”特徵變數段落”的方法來進行分析的可行性。 / The surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a technique for presenting the expression of molecular masses. It is obvious that every spectrum has a huge dimension of features. In order to analyze these types of spectra samples, preprocessing steps are necessary. The steps of preprocessing include baseline subtraction, normalization, peak detection, and alignment. In our study, we use a prostate cancer data for demonstration. This prostate cancer data can be classified into four categories, namely, healthy men, benign prostate hyperplasia, early stage prostate cancer, and late stage prostate cancer. We analyzed both the preprocessed data processed by ourselves and the preprocessed data done by Adam et al.. In this thesis, we use segmentations of features as “new features” in attempt to solve problems due to location shifts and isotopes. The selection of important segmentations was based on the values of AUC and the SVM was applied for classification. For four-class classification, 94 % and 86 % of accuracy were obtained for training samples and validation samples, respectively, by using Dr. Adam et al.’s preprocessed data, and 89% for training samples, and 63% for validation samples by using our preprocessed data. This study suggested that the preprocessed method does have effect on classification result and a reasonable classification result can be obtained by using segmentations of features. 特徵選取分類 ROC曲線下面積支援向量機 AUC feature selection classification segmentation SELDI SVM
5	兩階段特徵選取法在蛋白質質譜儀資料之應用 / A Two-Stage Approach of Feature Selection on Proteomic Spectra Data 王健源, Wang,Chien-yuan Unknown Date (has links) 藉由「早期發現，早期治療」的方式，我們可以降低癌症的死亡率。因此找出與癌症病變有關的生物標記以期及早發現與治療是一項重要的工作。本研究分析了包含正常人以及攝護腺癌症病人實際的蛋白質質譜資料，而這些蛋白質質譜資料是來自於表面強化雷射解吸電離飛行質譜技術（SELDI-TOF MS）的蛋白質晶片實驗。表面增強雷射脫附遊離飛行時間質譜技術可有效地留存生物樣本的蛋白質特徵。如果沒有經過適當的事前處理步驟以消除實驗雜訊，ㄧ個質譜中可能包含多於數百或數千的特徵變數。為了加速對於可能的蛋白質生物標記的搜尋，我們只考慮可以區分癌症病人與正常人的特徵變數。基因演算法是一種類似生物基因演化的總體最佳化搜尋機制，它可以有效地在高維度空間中去尋找可能的最佳解。本研究中，我們利用仿基因演算法(GAL)進行蛋白質的特徵選取以區分癌症病人與正常人。另外，我們提出兩種兩階段仿基因演算法(TSGAL)，以嘗試改善仿基因演算法的缺點。 / Early detection and diagnosis can effectively reduce the mortality of cancer. The discovery of biomarkers for the early detection and diagnosis of cancer is thus an important task. In this study, a real proteomic spectra data set of prostate cancer patients and normal patients was analyzed. The data were collected from a Surface-Enhanced Laser Desorption/Ionization Time-Of-Flight Mass Spectrometry (SELDI-TOF MS) experiment. The SELDI-TOF MS technology captures protein features in a biological sample. Without suitable pre-processing steps to remove experimental noise, a mass spectrum could consists of more than hundreds or thousands of peaks. To narrow down the search for possible protein biomarkers, only those features that can distinguish between cancer and normal patients are selected. Genetic Algorithm (GA) is a global optimization procedure that uses an analogy of the genetic evolution of biological organisms. It’s shown that GA is effective in searching complex high-dimensional space. In this study, we consider GA-Like algorithm (GAL) for feature selection on proteomic spectra data in classifying prostate cancer patients from normal patients. In addition, we propose two types of Two-Stage GAL algorithm (TSGAL) to improve the GAL. 特徵選取基因演算法支援向量機 Feature Selection Genetic Algorithm (GA) SELDI Support Vector Machines (SVM)
6	應用探勘技術於社會輿情以預測捷運週邊房地產市場之研究 / A Study of Applying Public Opinion Mining to Predict the Housing Market Near the Taipei MRT Stations 吳佳芸, Wu, Chia Yun Unknown Date (has links) 因網際網路帶來的便利性與即時性，網路新聞成為社會大眾吸收與傳遞新聞資訊的重要管道之一，而累積的巨量新聞亦可反映出社會輿論對某特定新聞議題之即時反應、熱門程度以及情緒走向等。因此，本研究期望借由意見探勘與情緒分析技術，從特定領域新聞中挖掘出有價值的關聯，並結合傳統機器學習建立一個房地產市場的預測模式，提供購屋決策的參考依據。本研究搜集99年1月1日至103年6月30日共1,1150筆房地產新聞，以及8,165件捷運週邊250公尺內房屋買賣交易資料，運用意見探勘萃取意見詞彙進行情緒分析，並建立房市情緒與成交價量時間序列，透過半年移動平均、二次移動平均及成長斜率，瞭解社會輿情對房市行情抱持樂觀或悲觀，分析社會情緒與實際房地產成交間關聯性，以期能找出房地產買賣時機點，並進一步結合情緒及房地產的環境影響因素，藉由支援向量機建立站點房市的預測模型。實證結果中，本研究發現房市情緒與成交價量之波動有一定的週期與相關性，且新捷運開通前一年將連帶影響整體捷運房市波動，當成交線穿越情緒線且斜率同時向上時，可做為適當的房市進場時機點。而本研究針對站點情緒與環境變數所建立之預測模型，其預測新捷運線站點之平均準確率為69.2％，而預測新捷運線熱門站點之準確率為78％，顯示模型於預測熱門站點上具有不錯的預測能力。 / Nowadays, E-News have become an important way for people to get daily information. These enormous amounts of news could reflect public opinions on a particular attention or sentiment trends in news topics. Therefore, how to use opinion mining and sentiment analysis technology to dig out valuable information from particular news becomes the latest issue. In this study, we collected 1,1150 house news and 8,165 house transaction records around the MRT stations within 250 meters over the last five years. We extracted the emotion words from the news by manipulating opinion mining. Furthermore, we built moving average lines and the slope of the moving average in order to explore the relationship and entry point between public opinion and housing market. In conclusion, we indicated that there is a high correlation between the news sentiment and housing market. We also uses SVM algorithm to construct a model to predict housing hotspots. The results demonstrate that the SVM model reaches average accuracy at 69.2% and the model accuracy increases up to 78% for predicting housing hotspots. Besides, we also provide investors with a basis of entry point into the housing market by utilizing the moving average cross overs and slopes analysis and a better way of predicting housing hotspots. 文字探勘情緒探勘房地產移動平均支援向量機 Text Mining Opinion Mining Housing Market Moving Average Support Vector Machine
7	對使用者評論之情感分析研究－以Google Play市集為例 / Research into App user opinions with Sentimental Analysis on the Google Play market 林育龍, Lin, Yu Long Unknown Date (has links) 全球智慧型手機的出貨量持續提升，且熱門市集的App下載次數紛紛突破500億次。而在iOS和Android手機App市集中，App的評價和評論對App在市集的排序有很大的影響；對於App開發者而言，透過評論確實可掌握使用者的需求，並在產生抱怨前能快速反應避免危機。然而，每日多達上百篇的評論，透過人力逐篇查看，不止耗費時間，更無法整合性的瞭解使用者的需求與問題。文字情感分析通常會使用監督式或非監督式的方法分析文字評論，其中監督式方法被證實透過簡單的文件量化方法就可達到很高的正確率。但監督式方法有無法預期未知趨勢的限制，且需要進行耗費人力的文章類別標注工作。本研究透過情感傾向和熱門關注議題兩個面向來分析App評論，提出一個混合非監督式與監督式的中文情感分析方法。我們先透過非監督式方法標注評論類別，並作視覺化整理呈現，最後再用監督式方法建立分類模型，並驗證其效果。在實驗結果中，利用中文詞彙網路所建立的情感詞集，確實可用來判斷評論的正反情緒，唯判斷負面評論效果不佳需作改善。在議題擷取方面，嘗試使用兩種不同分群方法，其中使用NPMI衡量字詞間關係強度，再配合社群網路分析的Concor方法結果有不錯的成效。最後在使用監督式學習的分類結果中，情感傾向的分類正確率達到87%，關注議題的分類正確率達到96%，皆有不錯表現。本研究利用中文詞彙網路與社會網路分析，來發展一個非監督式的中文類別判斷方法，並建立一個中文情感分析的範例。另外透過建立全面性的視覺化報告來瞭解使用者的正反回饋意見，並可透過分類模型來掌握新評論的內容，以提供App開發者在市場上之競爭智慧。 / While the number of smartphone shipment is continuesly growing, the number of App downloads from the popular app markets has been already over 50 billion. By Apple App Store and Google Play, ratings and reviews play a more important role in influencing app difusion. While app developers can realize users’ needs by app reviews, more than thousands of reviews produced by user everday become difficult to be read and collated. Sentiment Analysis researchs encompass supervised and unsupervised methods for analyzing review text. The supervised learning is proven as a useful method and can reach high accuracy, but there are limits where future trend can not be recognized and the labels of individual classes must be made manually. We concentrate on two issues, viz Sentiment Orientation and Popular Topic, to propose a Chinese Sentiment Analysis method which combines supervised and unsupervised learning. At First, we use unsupervised learning to label every review articles and produce visualized reports. Secondly, we employee supervised learning to build classification model and verify the result. In the experiment, the Chinese WordNet is used to build sentiment lexicon to determin review’s sentiment orientation, but the result shows it is weak to find out negative review opinions. In the Topic Extraction phase, we apply two clustering methods to extract Popular Topic classes and its result is excellent by using of NPMI Model with Social Network Analysis Method i.e. Concor. In the supervised learning phase, the accuracy of Sentiment Orientation class is 87% and the accuracy of Popular Topic class is 96%. In this research, we conduct an exemplification of the unsupervised method by means of Chinese WorkNet and Social Network Analysis to determin the review classes. Also, we build a comprehensive visualized report to realize users’ feedbacks and utilize classification to explore new comments. Last but not least, with Chinese Sentiment Analysis of this research, and the competitive intelligence in App market can be provided to the App develops. 情感分析文字分類支援向量機社會網路分析對應分析 Sentiment Analysis Text Classification Support Vector Machine Social Network Analysis Correspondence Analysis
8	對於高維度資料進行特徵選取-應用於分類蛋白質質譜儀資料黃仁澤 Unknown Date (has links) 傳統的腫瘤指標篩檢方法，往往靈敏度、普及度及特異性有限，無法得到正確、即時的診斷結果。現今癌症的研究，則透過蛋白質體學經由光譜及影像觀察癌症不同時期的蛋白質表現變化，期望未來得以發展較佳之診斷工具。本研究中主要針對兩組攝護腺癌症病人之蛋白質質譜資料，此資料應用蛋白質晶片與表面強化雷射解吸電離飛行質譜技術（SELDI-TOF-MS）收集而來。我們的研究目的在於從大量的蛋白質特徵中篩選出一群有助於分類的蛋白質特徵變數。我們提出以最小分錯率特徵選取法與最小p值（檢定、Kruskal-Wallis檢定）特徵選取法進行初步特徵辨識度排序以及選取，並進一步發展出k-mean萃取法、最大相關係數萃取法與判定係數萃取法以改善變數間嚴重的共線性問題。我們利用支援向量機（Support Vector Machine）方法進行分類並評估分類效果，在不同的分類目的下萃取有助於辨識的蛋白質特徵，以決定最佳特徵集合。研究發現運用最小分錯率特徵選取法與最小p值分錯率特徵選取法，輔以判定係數萃取法，在各分類目的下皆有良好表現，為較佳的特徵選取方式。蛋白質體學蛋白質質譜特徵選取支援向量機

Search results