Return to search

深度學習於中文句子之表示法學習 / Deep learning techniques for Chinese sentence representation learning

本篇論文主要在探討如何利用近期發展之深度學習技術在於中文句子分散式表示法學習。近期深度學習受到極大的注目,相關技術也隨之蓬勃發展。然而相關的分散式表示方式,大多以英文為主的其他印歐語系作為主要的衡量對象,也據其特性發展。除了印歐語系外,另外漢藏語系及阿爾泰語系等也有眾多使用人口。還有獨立語系的像日語、韓語等語系存在,各自也有其不同的特性。中文本身屬於漢藏語系,本身具有相當不同的特性,像是孤立語、聲調、量詞等。近來也有許多論文使用多語系的資料集作為評量標準,但鮮少去討論各語言間表現的差異。

本論文利用句子情緒分類之實驗,來比較近期所發展之深度學習之技術與傳統詞向量表示法的差異,我們將以TF-IDF為基準比較其他三個PVDM、Siamese-CBOW及Fasttext的表現差異,也深入探討此些模型對於中文句子情緒分類之表現。 / The paper demonstrates how the deep learning methods published in recent years applied in Chinese sentence representation learning.

Recently, the deep learning techniques have attracted the great attention. Related areas also grow enormously.
However, the most techniques use Indo-European languages mainly as evaluation objective and developed corresponding to their properties. Besides Indo-European languages, there are Sino-Tibetan language and Altaic language, which also spoken widely. There are only some independent languages like Japanese or Korean, which have their own properties. Chinese itself is belonged to Sino-Tibetan language family and has some characters like isolating language, tone, count word...etc.Recently, many publications also use the multilingual dataset to evaluate their performance, but few of them discuss the differences among different languages.

This thesis demonstrates that we perform the sentiment analysis on Chinese Weibo dataset to quantize the effectiveness of different deep learning techniques. We compared the traditional TF-IDF model with PVDM, Siamese-CBOW, and FastText, and evaluate the model they created.

Identiferoai:union.ndltd.org:CHENGCHI/G0103971010
Creators管芸辰, Kuan, Yun Chen
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0017 seconds