1 |
兩種中文情感運算分析策略: 以部首為基礎及深層類神經學習 / Two Chinese Sentiment Analysis Approaches: Radical-based and Deep Learning Neural Network趙逢毅, Chao, August F.Y. Unknown Date (has links)
評論是所有人類行為的核心,因為它影響我們行為的關鍵因素。我們都試著從不同型式的評論分析與研究試著從作者字裡行間的文字呈現內容深入推敲及理解,從而要能過濾出能協助決策的有用資訊。在早期的評論研究將評論視為是文本分類問題,直到2000年前後,從分析評論的主觀句子與評論裡形容詞的程度衡量用詞,學者們開始對解構整篇文本的內容,並試著從語言學的角度分析用字遣詞與情感方向之間的關聯。這種從文字語義關聯分析評論的方式,也使文本挖掘技術必需結合自然語言的處理原則,才能更準確地了解評論的內容。隨著許多新興的機器學習演算法與自然語言處理方法不斷地推陳出新,及網路使用行為拓展至電子商務與線上虛擬社群的建立,情感分析研究亦開始不斷地蓬勃發展。
漢文不同於世界其它語言,它擁有許多獨特表徵:無空格區隔、一字一語素、依詞為語言中表達意義的最小獨立單位,也使得在套用源自西方的情感分析原則時更加困難。然而過去的研究者則加以利用這些語言特徵,建立出專屬中文的情感分析原則。我們務實地討論適用於中文情感分析的情境(a)可取得情感分析資源及專家語言智慧,及(b)可取得領域字詞特徵向量定義的兩個前題下,提出適合的中文情感分析策略。在情境(a)中,我們深入討論運用部首資訊至情感分析中的適用性,並且提出一套能精萃出領域評論文本的觀測字詞/部首組的方法。研究中我們萃取出50個部首組,並運用在領域相近的評論裡得到很好的情感分類成效。而在情境(b)中我們提出適合深層類神經網路學習方法的評論字詞的權重過濾原則,不僅能確保評論字詞在學習過程中仍保有能積旋出合適屬性,並且驗證此權重原則在支援向量機的學習方式下亦有相同的優勢。在研究中,我們亦討論此兩種情境下進行情感分析的必要條件與資訊,並為未來更深入的中文情感分析起到墊腳石的作用。 / Opinion is the core of human behaviors, because it directly influences key factor of our behaviors. Despite of personal or organizational decision making processes, we all constantly conduct various kinds of opinion analysis, including explaining and comprehending what users present. At the beginning, opinion studies considered as a text mining problems, and tried to cluster opinions into positive and negative groups. After 2000, researchers intended to decompose sentences from whole opinions by analysing subjective expressing and adjective words presenting within, as well as explained the relationships between semantics and sentiment from linguistics aspect. Therefore, opinion analysis has to incorporate with natural language processing techniques, so we can understand the opinion contents. Nowadays, sentiment analysis grows event booming due to emerging machine learning and natural language processing approaches, as well as the needs of electronic commerce and virtual community on line.
Unfortunately, Chinese is quite unlike other language due to non-space separated, one character as one morpheme, and considering words (compositing with several characters) as minimum semantic expression unit. And those language features also bring difficult to adopted sentiment analysis principles from English. Nevertheless, researchers leveraged Chinese language information to propose specific sentiment analysis approaches dedicated to analyze Chinese opinions. In this study, we practically discussed the situations of conducting sentiment analysis: (a) using sentiment analysis resources and experts’ knowledge; and (b) using word feature vector, called word2vec, and deep learning. In (a) scenario, we propose a Chinese radical-based sentiment analysis approach and experiment the applicability. We also proposed a feature extraction method, so we can generate 50 seeds for further analysis. In (b), we compared 4 different feature selection approaches for deep learning, in order to keep accuracy and make sure understandable feature can be generated in neural network. We also tested feature selection approaches in SVM classifier and retrieved similar results. In this study, we also discussed essential constraints and required information in both scenarios, as well as the results of this study can be the foundation of continuing Chinese sentiment analysis studies.
|
Page generated in 0.0219 seconds