股市趨勢預測之研究 -財經評論文本情感分析 / Predict the trend in the stock by Sentiment analyzing financial posts

蔡宇祥, Tsai, Yu Shiang Unknown Date (has links)
根據過去研究指出,社群網站上的貼文訊息會對群眾情緒造成影響,進而影響股市波動,故對於投資者而言,如果能快速分析大量社群網站的財經文本來推測投資情緒進而預測股市走勢,將可提升投資獲利。 過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,但監督式學習方法所使用的訓練資料集須有事先定義好的已知類別,故其有無法預期未知類別的限制,所以本研究透過深度學習方法,從巨量資料及裡抓出有關於股市之文章,並透過財經文本的混合監督式學習與非監督式學習之情感分析方法,透過非監督式學習對微博財經貼文進行文本主題判別、情緒指數計算與情緒傾向標注,並且透過監督式學習的方式,建立分類模型以預測上海指數走勢,最後配合視覺化工具作趨勢線圖分析,找出具有領先指標特性之主題。 在實驗結果中,深度學習方面,本研究透過word2vec抓取有效之股市主題文章,有效篩選了需要分析之文本,主題模型方面,我們最後使用LDA作為本研究標註主題之方法,因為其文本數量大於議題詞數量造成TFIDF矩陣過於稀疏,造成Kmeans分群效果不佳,故後續採用LDA主題模型進行主題標注。情緒傾向標注方面,透過擴充後的情感詞集比起NTUSD有更好的詞性分數判斷效果,計算出的情緒指數之趨勢線能有效預測上海指數之趨勢。此外,並非所有主題模型之情緒指數皆具有領先特性,僅公司表現與上海指數之主題模型的情緒指數能提前反應上海指數趨勢,故本研究用此二主題之文本的情緒指數來建立分類模型。 本研究透過比較情緒指數與單純指數指標分類模型的準確度,前者較後者高出7%的準確率。故證實了情感分析確實能有效提升上海指數趨勢預測準確度,幫助投資者增加股市報酬率。

網路資訊尋求策略融入數位學習之研究-以Big6結合WebQuest於Moodle為例 / E-learning and information-seeking strategies:Combining Big6 and WebQuest on moodle

謝章威, Sia, Chong Wee Unknown Date (has links)
數位學習近年來快速成長,網路主題探究式學習已經成為人們關注的新學習模式之一,WebQuest是一種非常成熟的網路學習模式,學習者使用的資訊來源多數來自網際網路,由教學者規劃與指引學習過程給學習者,讓學習者可以循序漸進地進行主題探究式學習。眾多研究結果發現,WebQuest學習模式的確有助於提升學習者的高層次思考、合作學習、問題解決能力,因此,WebQuest成為許多教學者與學生喜歡使用的網路學習模式之一。 WebQuest由教學者篩選網路資源給學生,故學生不需自行搜尋資源,考量到資訊社會的來臨,在資訊過載的時代,想要從網際網路上的巨量資訊中,有條理地處理複雜資訊,培養學生的資訊尋求策略在所難免。資訊教育已成為全球發展重要議題,也是我國九年一貫新課程的七大議題之一,而資訊尋求策略與九年一貫資訊教育目標的關係密切,Big6也被認為是培養資訊素養的最佳途徑。 因此,本研究將Big6技能融入WebQuest學習模式之概念,讓學生透過主題探究所提供的鷹架引導學習的同時,也可以訓練學生對資訊的搜尋、處理、分析、展示與應用的能力,希望接近與九年一貫所設定的資訊教育之核心能力目標,並且實作一雛型來驗證此概念。本研究結果顯示Big6技能適合融入WebQuest網路學習模式,而WebQuest & Big6之概念可於Moodle教學平台實現,希望未來提供教學者及學習者使用。 / In recent years, e-learning has developed rapidly and inquiry-based learning network such as WebQuest has become one of the most prominent and latest modes of learning and absorbing knowledge. WebQuest is a sophisticated e-learning tool in which most of the information provided to the learners comes from the web. However, the webs provided are tailor-made and carefully selected by the educators so the learners can engage in the inquiry-oriented lesson step by step and without deviating from the right course. Furthermore, many scholars has discovered that this particular pattern of learning has proven efficient for internet learners to enhance their skills in higher-order thinking, cooperative thinking as well as problem solving. Given these advantages, it is indisputable that why WebQuests is well received by both educators and learners. In view of we are living in an information society where information overloads, it is hence immensely difficult for internet users to select and process viable internet resources. In dealing with this problem, in WebQuest, it is the educators who are responsible to sieve out useful internet resources for the learners instead of leaving them alone and helpless with abundant information. With IT education has become one of the important issues of global development, it is extremely essential to cultivate internet users with such selective skill. Taiwan, in particular, has even listed IT education as one of the Seven Issues within the Grade 1-9 Curriculum. As the Grade 1-9 Curriculum and information seeking strategies are intimately related to each other, the information problem-solving process model of Big6 is thus deemed as the optimal way of cultivating one’s information literacy. Apart from guiding the internet learners with the scaffold of the inquiry-oriented learning WebQuest, my research, moreover, intends to integrate both the Big6 process model and WebQuest together in order to enhance learners’ ability in internet resources searching, processing, analyzing, displaying and application, as so to achieve the goal of the Grade 1-9 Curriculum in promoting IT education. In addition, I will develop a prototype to validate my conception of the combination between Big6 and WebQuest. The result of my research, besides indicating that Big6 can be integrated with WebQuest, has also proven that the combination’s viability on the Moodle’s platform, which I anticipate, will be facilitating both the educators and the learners in the coming future.

連繫功能詞與主題推進教學對增進EFL低成就者的寫作連貫性之研究 / The improvement of coherence in EFL low achievers' writing through the instruction of cohesive devices and thematic progression

林舒悠, Lin, Su Yu Unknown Date (has links)
英文作文教學在英語學習上是不可或缺的一環,而實際上在台灣,英文作文教學仍是透過分析文法和句型來進行的,但這樣的方式忽略了英文寫作能夠成功的主要關鍵——連貫性。由於連貫性的晦澀難解,教師與學生皆會認為在教授和學習寫出有連貫性的文章是很複雜的。本文藉由教導學生如何於寫作時運用連繫功能詞(cohesive devices)和主題推進(thematic progression)兩種策略來探究作文中的連貫性,以研究其增進高中生英文寫作表現的可能性。本研究在不影響正常教學進度的前提下進行,研究者的39位臺北市高二學生首先接受如何分析課本文章連貫性的指導,並接著應用和檢視連繫功能詞以及主題推進於他們的寫作中。其中有兩位低成就者進一步地被挑選出來,藉由寫作會談(writing conference)以瞭解受試者在寫作時,如何利用以上兩種連貫性的策略於其英文寫作中;同時也透過訪談和日記撰寫的方式,來探討學生的學習連貫性策略的歷程。 由相關的量化和質性資料可看出,本研究的結果顯示受試者在作文整體表現與作文連貫性上的分數明顯偏高;他們也被證明能夠和其他研究中的高成就者一般,運用相同的連繫功能詞(指稱、連接與重述)和主題推進(線性推進與主題連續推進)種類。在這歷時五個月的研究中,受試者也在連繫功能詞與主題推進的協助下,培養出更注意作文內容以及進行適當修改的能力。此外,也由於受試者對於學習連繫功能詞與主題推進抱持著正面的態度,本研究建議應結合連繫功能詞和主題推進,並將其融入台灣現存的正規作文教學之中,藉著分析作文中的連繫功能詞與主題推進,協助學生理解抽象的連貫性進而使作文表現更進步。 / As an indispensable element in English learning, writing instruction in Taiwan actually has been given through the analysis of grammar and sentence patterns. This however ignores another primary key to successful writing—coherence. Due to its obscurity, teachers and students both find it complicated to teach and acquire the ability to organize a written text logically and coherently. The current study investigated the possibility to improve the high school students’ writing by instructing them how to apply the coherence strategies to their writing—cohesive devices (CD) and thematic progression (TP). In this study, with the teaching schedule unaffected, the researcher’s 39 second-graders in one Taipei senior high school were first taught how to analyze the coherence of the reading passages in the textbooks. Then they were required to apply and examine CD and TP when writing. Two low achievers of these students were further selected to investigate their writing development by examining their written texts with CD and TP applied in writing conferences, having interviews with the researcher and keeping journals between classes and writing conferences. Concluded from the quantitative and qualitative data collected, the results revealed that the two low-proficiency students were able to get high grades in the holistic writing performance and coherence of writing. They were meanwhile found to be able to apply the same categories of CD (reference, conjunction and reiteration) and types of TP (simple linear TP type and TP with a constant theme) as the ones used by the high-proficiency learners in previous studies. During the five-month study, the participants also cultivated the ability to focus more on content level and to revise properly in their writing with the help of CD and TP. Besides, since the participants held positive attitudes toward the learning of CD and TP, the researcher recommends that the combination of CD and TP should be integrated into the writing instruction in current normal English writing class in Taiwan to help students comprehend the abstract coherence and to improve their writing by analyzing CD and TP in the written texts.

主題性路跑參賽者之經濟效益分析 / Evaluating the Benefits of Runners of Theme Run in Taipei

劉仕傑, Shi-Jay Liu January 1900 (has links)
本文目的要評估在臺北市舉行的「劍中劍‧風中劍路跑檢定賽」及「台北星光夜跑Taipei Night Run」主題性路跑賽事的路跑參賽者滿意度之經濟效益,並運用旅行成本法及消費者剩餘發展效益模型,並使用因素及集群分析方法分析主題性路跑活動之參賽動機,根據前因素分析結果,共萃取出兩個因素構面,分別為「身心健康」、「人際關係」。在確認三個集群個數後,更進一步的利用單因子變異數分析與雪費事後檢定(Scheffe’s Test)對這三個集群加以命名,分別命名為「自我追求型」、「社交人際型」及「一般類型」。並以此三種集群類型當作主題性路跑活動之參賽者需求曲線之影響因素。此外,此兩場賽事活動對參賽者(路跑者)及周邊產業而言,約可以創造639萬元及1386萬元的產值;並可創造約0.6082億元及1.3179億元的經濟效益。近年來路跑活動盛行,各界日益關切在臺北市所舉辦的主題性路跑活動所造成的影響,若可以加入客觀且具體估計出來路跑者的經濟效益當作衡量路跑活動的補助,核准等項目指標,可供日後臺北市政府及主辦單位對於未來的路跑活動發展之改善。 / 摘要 I 目錄 II 圖目錄 III 表目錄 IV 第壹章緒論 1 第一節研究背景與動機 1 第二節研究目的 3 第三節研究問題 3 第四節研究範圍 3 第五節研究流程 3 第貳章文獻探討 5 第一節名詞釋義 5 第二節文獻探討 7 第三節小結 12 第參章理論架構 13 第一節樣本分析 13 第二節架構分析 19 第三節小結 21 第肆章實證模型與結果分析 22 第一節路跑參賽者參賽動機因素分析與集群探討 22 第二節實證結果 26 第伍章結論與建議 34 第一節結論 34 第二節建議 35 參考文獻 36 附錄 39 圖目錄 圖1-1 研究流程圖 4 圖3-1 滿意度百分比圖 15 圖3-2 參與動機百分比圖 15 圖3-3 性別與婚姻狀況百分比圖 16 圖3-4 年齡百分比圖 17 圖3-5 教育程度及職業百分比圖 17 圖3-6 每月平均所得百分比圖 18 圖3-6 滿意度指標改變引起跑者需求的變化 20 圖4-1 參賽者一整年所參加路跑活動的消費者剩餘 32 表目錄 表2-1 路跑活動參賽者之參與動機與滿意度分析摘要表 8 表2-2 旅行成本法與經濟效益分析之研究摘要 10 表4-1 路跑參賽者參賽動機因素分析表 23 表4-2 跑者參賽動機因素之單因子變異數分析 25 表4-3 主要變數之定義與平均值 27 表4-4 樣本之參與路跑次數、路跑成本及滿意度指標平均值統計表 28 表4-5 路跑活動需求函數之估計係數 30 表5-1 路跑賽事之消費者剩餘、路跑者支出、經濟效益分析 33

以社群媒體輔助新聞主題探索的視覺化資訊系統 / A Visualization Information System to Assist News Topics Exploration with Social Media

林靖雅, Lin, Ching Ya Unknown Date (has links)
隨著社群媒體的普及,群眾產製的內容(User-generated content, UGC)時常成為新聞記者取材的對象,但現今隨著社群媒體爆發的資料量,記者不易從資料中看到事件的全貌,僅將社群媒體當作一種消息來源,因此報導的內容經常抄襲網友的意見或是落入片面討論的窠臼,無法駕馭社群媒體帶來的豐富資料。考慮改善這樣的現象,本研究透過將新聞取材的過程分為探索事件、收集素材以及回溯情境三個動作來協助記者探索新聞主題。以推特(Twitter)的資料為例,以網路為系統平台,開發一個輔助記者探索社群媒體上的事件、挖掘新聞主題的資訊系統,利用網絡分析以及自然語言處理的技術,結合視覺化的介面將事件資料集用故事元素的方式呈現,四種故事元素模型提供不同的觀察資料集的角度,並利用調整四種故事元素的權重,還原推文文本的語境,找出使用者想看的內容。我們設計了兩階段的任務式實驗以及評估問卷來證明系統的可用性,透過實驗結果驗證了本研究在以社群媒體輔助記者探索新聞主題的系統之價值,能讓對事件不同熟悉程度的傳播記者在此平台上探索新聞主題,並寫下深度報導的編採線索或是一篇新聞報導,透過本系統的輔助,讓使用者在探索及追蹤一起事件時,變得較為快速。 / With the popularity of social media, news reporters usually draw the news materials from mass user-generated content. However, with the outbreak of social media data, the reporter is not easy to see from the data in the whole picture of event. They only use the social media as a news source, so the reported content often copied the views of users, or fall into the stereotype of a one-sided discussion. The reporters can not control the wealth of information brought from social media. Consider improving this phenomenon, our study use Twitter data for example, develop an information system to assist reporters to explore the events on social media, and mine the news topics. We use network analysis and natural language processing as our technique, and show the story elements with the visualization interface. We apply four different story elements model, support the different way to explore data, and let user can adjust the weights from different model to retrospect to the context of tweets, help user find the news topics. We have designed a two-stage task experiment and assessment questionnaire to prove the availability of the system through experimental results. We can allow the reporters who are varying degrees of familiarity of the event to explore news topics from our system. We make the reporter to explore and track some events faster.

測量華語的真實溫度:以幻想主題分析方法閱讀臺灣華語熱潮 / Measuring the real temperature of Chinese : a fantasy theme criticism of Chinese trend in Taiwan

蔣宗諴 Unknown Date (has links)
自2003年行政院正式核定成立國家對外華語文政策委員會以來,臺灣華語教學迅速發展,不但大學/民間華語中心倍增、華語教學系所林立,相關政策亦持續積極建設,於是可見相關論述逐漸形構出「華語熱潮」的想像。然此「熱」的本質究竟為何? 是Menia,一種病態的狂熱?還是Heat,一種讓人灼傷的溫度?或是Hot,一股令人垂涎的風潮?還是Fever,一波使人昏眩的幻想? 本文試以Bormann Ernest G. 所提出的幻想主題分析方法(fantasy theme criticism)解讀臺灣華語熱潮,針對「華語教學」進行後設的反思與宏觀的檢視,探討在華語教學活動參與者的想像之下:華語教學是什麼?為什麼?臺灣華語熱潮又是如何成形並延燒?這股熱潮且又隱含怎麼樣的魅力,能聚集華語教學活動者參與支持?本文將藉由語藝文本的拆解與重組,呈現團體成員認知中的真實世界,並透過語藝視角理解臺灣華語熱潮現象的運作過程及內在本質,以期為臺灣華語教學明確定位,未來能以更完善的方式規劃臺灣華語教學的走向。同時,也希望透過臺灣華語熱潮論述之語藝梳理,提供國內華語教學發展另一種自我反省與觀看的方式。 分析臺灣華熱潮論述語藝後可以發現:「中國崛起」一幕隨著「全球化」的作用表現戲劇張力,突顯重要道具-中文的重要性;華語教師、中國大陸對外漢語教學、各國市場等角色一一登台亮相,彼此藉由語言教學的機會周旋於商業與外交活動之間;而華語教學中所提供的多元機會以及民族情感渲染,則成為華語熱潮之所以成形的重要因素。各類幻想主題依循著幻想類型兩條主線:「珍視中華/臺灣文化」和「強調公眾/個體利益」脈絡交叉串連,形成文化與利益交織的價值體系,並匯聚成臺灣華語熱潮論述的全景圖,最終開展出「臺灣,從未缺席」的語藝視野。 以幻想主題分析方法閱讀臺灣華語熱潮,可見全球化、國際化、在地化等機制,皆在「語言」和「教學」的包裝之下巧妙地運作著。語藝文本共同反應出充滿期待、深切期許且積極的世界觀,其中並多強調群我之間的區辨,試圖劃清專業界線並隱約透露出些許的優越感;論述間亦充斥著大我中華的想像,視正統與傳統為唯一價值;敘述中並呈現出模糊空泛的主體,他者需求無限放大,語言及其文化的主體性則受到擠壓。種種臺灣華語熱潮論述所呈現的語藝特色,反映出臺灣華語教學現況以及未來可能走向,值得臺灣華語教學發展反省深思。 / This thesis aims to interpret Chinese trend in Taiwan with fantasy theme criticism theorized by Bormann Ernest G. Moreover, issues such as “What’s TCSL?” “Why do we have to deal with TCSL?” “How is the Chinese trend in Taiwan formed?” and “What kind of attraction does the trend have?” are further discussed. Therefore, this thesis analyzes discourses in TCSL groups to unveil the members’ imagination, thus searching for the proper orientation of TCSL in Taiwan. Meanwhile, this thesis also applies different ways for TCSL groups to examine and reflect on the issue. To begin with, we can see “Chinese Century” and “Globalization” in the setting theme highlighting the importance of Chinese. Next, main roles in character theme-TCSL Teachers in Taiwan, TCSL in China, and Markets of the World, interact in business and diplomacy through Chinese teaching and learning. Moreover, various rewarding opportunities and national sentiment in TCSL rationalize all above group fantasies. Apart from that, each fantasy theme is arranged by two fantasy types: “Treasuring Chinese/Taiwan Culture” and “Emphasizing Public/Personal Benefits.” Finally, the rhetorical vision-“Taiwan, Never Absent” is analyzed in detail.

華語與韓語表達存在的對比分析及針對韓籍學生的華語教學策略 / Expressing existence in Mandarin and Korean: a contrastive analysis and application of teaching Mandarin to Korean students

李善禎, IY, Seon cheong Unknown Date (has links)
「存在句」普遍的存在人類的語言之中,華語存在句主要有兩種語序「處所+動詞+人/事物」如例 (1);「人/事物+動詞+處所」如例 (2)。    (1) 桌子上有書。 (2) 書在桌子上。 「在」字句的主語屬於「有定」或「有指」的「人/事物」,而「有」字句只能帶「無定」的「人/事物」作賓語。華語語言學家通常將「有」字句稱為「存現句」表示「某處存在某事物」,或者表示「位於某處所的事物的出現或者消失」。在語法形式上表示處所的名詞出現在存現句的句首,而表示存在、出現、消失的人或事物的名詞組出現在動詞後。從語言學習角度來說,華語的「存現句」對於外籍生是一個較為陌生的結構。本文從功能語法角度分析華語的「存現句」。以「認知分析」、「引介功能」、「傳達信息」、「焦點」、「主題」、「有定到無定」、「名詞的定性」為理論架構而處理存現句的形成與結構。 韓語沒有對應於華語存現句的句型,表示存在的動詞「在、有、是」都翻譯成韓語「있다(itta)」,使用助詞「著」的存現句翻譯成「V+아/어/여(ɑ/ə/yeo) 있다(itta)」。韓語存在句的基本語序為「處所詞組+名詞組+存在動詞」,例如: (3) 산 에 나무 가 있다. (山上有樹。) San e namu ga itta. 山 在 樹 有 本論文探討韓語存在句如何表現主題、焦點、定性,如何傳達信息而進行與華語存在句的對比分析。此外,從功能的角度提供適當的教學策略,並且針對韓籍學生設計存現句語法課程而應用在實際教學上,藉此探討功能語法在教學上應用的可行性。 / Existential Sentences generally exist in human languages. There are two kinds of word orders of Existential Sentences in Mandarin: locations+verbs+people/things(Example 1); people/things+verbs+locations(Example 2). Example 1: Zhuozi shang you shu. (桌子上有書。) Example 2: Shu zai zhuozi shang. (書在桌子上。)  Subject of Example 2 is definite people/things; however, Example 1 can only use indefinite people/things as its object. Example 1 is generally considered as Existential Sentence. Mandarin Existential Sentence is an unfamiliar structure for students learning Mandarin as a foreign language. This thesis analyzes Mandarin Existential Sentence from the perspective of Functional Grammar. Topics covered in the discussion of Mandarin Existential Sentence include cognitive perception of space, presentative function, information packaging, focus, topic and definiteness. There is no corresponding Mandarin Existential Sentence in Korean. Verbs meaning “zai ‘to be at’(在), you ‘to have’(有), shi ‘to be’(是)” are translated to 있다(itta) ‘be, have’ in Korean. This thesis discusses how Existential Sentences express their topic, focus, definiteness and information in Korean. This thesis also provides suggestions for teaching Mandarin Existential Sentence to Korean students and discusses the feasibility of Functional Grammar in teaching .

應用情感分析於指數型證券投資信託基金趨勢預測之研究 / Research into sentimental analysis to predict exchange-traded fund trend

黃泓銘, Huang, Hung-Ming Unknown Date (has links)
近年來ETF規模快速成長,亞洲區域經濟成長與穩步發展更是帶動國際ETF市場動力來源,而元大台灣50指數型證券投資信託基金因規模大,受到投資人的青睞。根據過去的研究指出,網路上的文本訊息會對群眾情緒造成影響,進而影響股價波動,對投資者而言,若能從大量網路財金快速分析投資者大眾情緒進而預測股價波動走勢,勢必可提高報酬率。然而,每日有上百篇的財金文本產生,人工分析耗時耗力,本研究採用文字探勘技術,提出一套情感分析的價格預測模型。 過去文本情感分析的研究中已證實監督式學習方法可以透過簡單量化的方式達到良好的分類效果,然而,為解決監督式學習無法預期未知的限制,本研究透過非監督式學習將2016整年度的財金文本進行文章主題判別,計算情緒指數並標記文本情緒傾向,再來使用監督式學習結合台股資訊指標、國際指標、總體經濟指標、技術指標等,建立分類模型以預測元大台灣50ETF的價格趨勢。 實驗結果中,主題標注方面,本研究發現因文本數量遠大於議題詞數量造成TF-IDF矩陣過於稀疏,使得TF-IDF結合K-means主題模型分類效果不佳。LDA主題模型基於所有主題被所有文章共享的特性,使得在字詞分群優於TF-IDF結合K-means。情緒傾向標注方面,證實本研究擴充後的情感詞集比起NTUSD有更好的字詞極性判斷效果。 本研究透過比較情緒指數結合技術指標之分類模型與單純技術指標分類模型的準確率發現,前者較後者高出7%的準確率。進一步結合間接情緒指標的分類模型更有71%準確率,故證實財金文本的情感分析確實能有效提升元大台灣50的價格趨勢預測。 / Rapid and stable economic growth in Asia motivated the asset scale of ETF in the globe growing rapidly in the recent years. Yuanta Taiwan Top 50 ETF gains the investors’ favor because of the advantages of large market scale. Past research have shown that the text documents on the internet, e.g. news and tweets, would make great effect on public emotion, and the public emotion could even affect the stock price. For investors, it is important to know how to analyze the potential emotion in text documents to predict the stock trend. However, the traditional way to analyze text documents by human cannot afford the large volume of financial text documents on the internet. In past sentimental analysis research, supervised method is proven as a method with high accuracy, but there are limits about predicting unknown future trend. This research combined supervised and unsupervised methods to deal with these large financial text documents. By using unsupervised method to find out the topic of documents, and then calculate the sentimental index of each documents to differentiate the sentiment polarity. Afterwards, using supervised method to build a prediction model with the sentimental index. According to the result, we found that the performance of LDA model is better than the TF-IDF with K-means model. Moreover, the prediction model which include the sentiment index has higher accuracy than the one include the technical indicators only.

應用主題探勘與標籤聚合於標籤推薦之研究 / Application of topic mining and tag clustering for tag recommendation

高挺桂, Kao, Ting Kuei Unknown Date (has links)
標記社群標籤是Web2.0以來流行的一種透過使用者詮釋和分享資訊的方式,作為傳統分類方法的替代,其方便、靈活的特色使得使用者能夠輕易地因應內容標註標籤。不過其也有缺點,除了有相當多無標籤標註的內容,也存在大量模糊、不精確的標籤,降低了系統本身組織分類標籤的能力。為了解決上述兩項問題,本研究提出了一種結合主題探勘與標籤聚合的自動化標籤推薦方法,期望能夠建立一個去人工過程的自動化標籤推薦規則,來推薦合適的標籤給使用者。 本研究蒐集了痞客邦部落格中,點閱次數大於5000次的熱門中文文章共2500篇,經過前處理,並以其中1939篇訓練模型及400篇作為測試語料來驗證方法。在主題探勘部分,本研究利用LDA主題模型計算不同文章的主題語意,來與既有標籤作出關聯,而能夠針對新進文章預測主題並推薦主題相關標籤給它。其中,本研究利用了能評斷模型表現情形的混淆度(Perplexity)來協助選取LDA的主題數,改善了LDA需要人主觀決定主題數的問題;在標籤聚合部分,本研究以階層式分群法,將有共同出現過的標籤群聚起來,以便找出有相似語意概念的標籤。其中,本研究將分群停止條件設定為共現次數最少為1次,改善了分群方法需要設定分群數量才能有結果的問題,也使本方法能夠自動化的找出合適的分群數目。 實驗結果顯示,依照文章主題語意來推薦標籤有一定程度的可行性,且以混淆度所協助選取的主題數取得一致性較好的結果。而依照階層式分群所分出的標籤群中,同一群中的標籤確實擁有相似、類似的概念語意。最後,在結合主題探勘與標籤聚合的方法上,其Top-1至Top-5的準確率平均提升了14.1%,且Top-1準確率也達到72.25%。代表本研究針對文章寫作及標記標籤的習性切入的做法,確實能幫助提升標籤推薦的準確率,也代表本研究確實建立了一個自動化的標籤推薦規則,能推薦出合適的標籤來幫助使用者在撰寫文章後,能夠更方便、精確的標上標籤。 / Tags are a popular way of interpreting and sharing information through use, and as a substitute for traditional classification methods, the convenience and flexibility of the community makes it easy for users to use. But it also has disadvantages, in addition to a considerable number of non-tagged content, there are also many fuzzy and inaccurate tags. To solve these two problems, this study proposes a tag recommendation method that combines the Topic Mining and Tag Clustering. In this study, we collected a total of 2500 articles by Pixnet as a corpus. In the Topic Mining section, this study uses the LDA Model to calculate the subject semantics of different articles to associate with existing tags, and we can predict topics for new articles to recommend topics related tags to them. Among them, the topics number of the LDA Model uses the Perplexity to help the selection. In the Tag Clustering section, this study uses the Hierarchical Clustering to collect the tags that have appeared together to find similar semantic concepts. The stop condition is set to a minimum of 1 co-occurrence times, which solves the problem that the clustering method needs to set the number of groups to have the result. First, the Topic Mining results show that it is feasible to recommend tags according to the semantics of the article, and the experiment proves that the number of topics chosen according to the Perplexity is superior to the other topics. Second, the Tag Clustering results show that the same group of tags does have similar conceptual semantics. Last, experiments show that the accuracy rate of Top-1 to Top-5 in combination with two methods increased average of 14.1%, and its Top-1 accuracy rate is 72.25%,and it tells that our tag recommendation method can recommend the appropriate tag for users to use.


古本, 真 26 November 2018 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(文学) / 甲第21408号 / 文博第779号 / 京都大学大学院文学研究科行動文化学専攻 / (主査)教授 吉田 豊, 教授 吉田 和彦, 准教授 千田 俊太郎, 教授 米田 信子 / 学位規則第4条第1項該当 / Doctor of Letters / Kyoto University / DFAM

