1 |
憂鬱傾向者之微博書寫分析 / Search for Depress Tendency: An Analysis on Chinese Micro-Blog Texts任喆鸝, Ren, Zhe Li Unknown Date (has links)
本文嘗試透過社群媒體微博進行憂鬱書寫識別,主要期望回答兩方面的問題:(一)中國憂鬱人群之社群媒體書寫特質為何?(二)如何透過該書寫特質識別更多的憂鬱文本?
透過對十位已確認之憂鬱症患者之微博關係圈進行滾雪球,發現 127憂鬱傾向者,共爬取憂鬱傾向者之微博文本20748則,作為文本分析之數據集,並運用內容分析、質化分析、詞頻分析及詞語共現等多種方法分析文本。
分析結果顯示:(1)透過對文本進行語調、情緒、主題及憂鬱程度的編碼後,我們發現憂鬱傾向者在微博之書寫含62%的負面語調及25.1%的憂鬱文本,其中,負面及憂鬱程度較高的書寫主題是「自我」、「親情」、「自殺」及「睡眠障礙」。(2)深入對「自我」及「親情」憂鬱書寫的質化分析後,發現他們不同於一般人的心理特質,其中,「自我厭惡」及「不被理解」是他們心中最難以釋懷的角落。(3)由於「自殺」、「睡眠障礙」屬於憂鬱人群特徵,經過分析發現透過主題關聯詞的共現詞組有助於辨識憂鬱人群,其中,「睡眠障礙」共現詞的憂鬱文本辨識度達74%,「自殺」共現詞的憂鬱文本辨識度達34%,未來透過機器的方式,可進一步優化該方法,提升憂鬱文本的辨識度。 / This research aims to answer the following questions:(1)What are the characteristics of micro-blog writing by the depressed tendency people? (2)How to identify the text in social media? Ten Wei-bo users with identified depressed tendency were chosen as starting points of snow-ball searching, and 127 users were located. A total of 20748 messages from this group of the users was collected as the dataset. Multiple methods were applied to analyze the texts: content analysis, qualitative text analysis, word frequency analysis and word co-occurrence.
The result indicated that: (1)Through the coding of the text tone, mood, theme and degree of depression, we find out that in micro blog writing, the depressive tendency uses 62% of the negative tone and 25.1% of the blue text. Among them, higher negative and degree of depression of writing subjects are "self", "family", "suicide" and "sleep disorder". (2)Through deep qualitative analysis of "self" and "affection" depressed writing, the "self loathing" and "don't understand" in their mind are the most unforgettable. (3)Because the depressed people have the features of "suicide" and "sleep disorder", through the analysis, we find that through theme related words, it is helpful in the identification of the depression text. Among them, the "sleep disorders" co-occurrence words depressed text identification is up to 74%, and "suicide" co-occurrence words depressed text identification degree is 34 %.In the future, through the computer, we can further optimize the method, and enhance the degree of identification of depression text.
|
2 |
透過Spark平台實現大數據分析與建模的比較:以微博為例 / Accomplish Big Data Analytic and Modeling Comparison on Spark: Weibo as an Example潘宗哲, Pan, Zong Jhe Unknown Date (has links)
資料的快速增長與變化以及分析工具日新月異,增加資料分析的挑戰,本研究希望透過一個完整機器學習流程,提供學術或企業在導入大數據分析時的參考藍圖。我們以Spark作為大數據分析的計算框架,利用MLlib的Spark.ml與Spark.mllib兩個套件建構機器學習模型,解決傳統資料分析時可能會遇到的問題。在資料分析過程中會比較Spark不同分析模組的適用性情境,首先使用本地端叢集進行開發,最後提交至Amazon雲端叢集加快建模與分析的效能。大數據資料分析流程將以微博為實驗範例,並使用香港大學新聞與傳媒研究中心提供的2012年大陸微博資料集,我們採用RDD、Spark SQL與GraphX萃取微博使用者貼文資料的特增值,並以隨機森林建構預測模型,來預測使用者是否具有官方認證的二元分類。 / The rapid growth of data volume and advanced data analytics tools dramatically increase the challenge of big data analytics services adoption. This paper presents a big data analytics pipeline referenced blueprint for academic and company when they consider importing the associated services. We propose to use Apache Spark as a big data computing framework, which Spark MLlib contains two packages Spark.ml and Spark.mllib, on building a machine learning model. This resolves the traditional data analytics problem. In this big data analytics pipeline, we address a situation for adopting suitable Spark modules. We first use local cluster to develop our data analytics project following the jobs submitted to AWS EC2 clusters to accelerate analytic performance. We demonstrate the proposed big data analytics blueprint by using 2012 Weibo datasets. Finally, we use Spark SQL and GraphX to extract information features from large amount of the Weibo users’ posts. The official certification prediction model is constructed for Weibo users through Random Forest algorithm.
|
Page generated in 0.0432 seconds