災難事件發生時,災難資訊的分析和傳遞需具有即時性,才能讓資訊運用達到防災與救災的目的。網路基礎設施普及後,災難資訊的提供者加入廣大的網路公眾媒體,單獨透過搜尋引擎檢索無法即時的反應災難目前狀態;而像災難應變中心這類傳統頻道的災難通報管道有限,經常無法負荷突然爆發的資訊。這些因災難爆發的瞬間巨量資料,已無法完全使用人力蒐集、過濾與處理,需要發展新的工具能夠快速的自動化分類新媒體頻道資訊,提供救災防災體系應變或政府決策時參考。
本研究收集莫拉克颱風八八水災期間五個頻道資料,經過文字處理與專家分類後,由頻率分布、分類結構組成與詞彙共現網絡,觀察不同頻道資料集之性質的異同。在未考慮詞性與文法的狀況下,使用向量空間模型訓練OAO-SVM分類器模型,評估自動化分類方式的績效。
根據分析結果我們發現災難發生後,網路上的資訊隨著時序存在著階段性的期程,能夠由各個頻道瞭解災難的進程。透過詞彙共現網絡,瞭解救難專家書寫相較於俗民書寫使用的詞彙少重複且異質性較高。使用OAO-SVM訓練分類器結果,救難專家書寫的頻道分類績效優於俗民書寫。分類器交叉比較後,對於同性質頻道的內容具有較好的分類績效。透過合併相同屬性資料集訓練,我們發現當訓練資料的品質夠好時,分類器能夠有不錯的分類績效。品質不夠時,可以經由增加訓練資料的數量來提升分類的績效。本研究的歸納,以及所發展出來的分類方式與資訊探索技術,未來可以用於開發更有效率且精確的社群感知器。 / When disaster events occur, information diffusion and transmission need to be in real-time in order to exploit the information in disaster prevention and recovery. With the establishment of network infrastructure, mass media also joins the role of information providers of disaster events on the internet. However retrieved information through search engines often cannot reflect the status of a progressing disaster. Traditional channels such as disaster reaction centers also have difficulty handling the inpour of disaster information, and which is usually beyond the ability of human processing. Thus there is a need to develop new tools to quickly automate classification of information from new media, to provide reliable information to disaster reaction centers, and assist policy decision-making.
In this study, we use the data during typhoon Morakot collected from five different channels. After word processing and content classification by experts, we observe the difference between these datasets by the frequency distribution, classification structures and word co-occurrence network. We use the vector space model to train the OAO-SVM classification model without considering speech and grammar, and evaluate the performance of automated classification.
From the results, we found that the chronology of internet data can identify a number of stages throughout the progression of disasters, allowing us to oversee the development of the disaster through each channel. Through word relation in word co-occurrence network, experts use fewer repeating words and high heterogeneity than amateur writing channels. The training results of classifier from the OAO-SVM model indicate that channels maintained by experts perform better than amateur writing. The cross compare classifier has better performance for channels with the same properties. When we merge the same property channel dataset to train classifier, we found that when the training data quality is good enough, the classifier can have a good performance. If the data quality is not enough, you can increase amount of training data to improve classification performance. As a contribution of this research, we believe the techniques developed and results of the analysis can be used to design more efficient and accurate social sensors in the future.
Identifer | oai:union.ndltd.org:CHENGCHI/G0099753014 |
Creators | 施旭峰, Shih, Shiuh Feng |
Publisher | 國立政治大學 |
Source Sets | National Chengchi University Libraries |
Language | 中文 |
Detected Language | English |
Type | text |
Rights | Copyright © nccu library on behalf of the copyright holders |
Page generated in 0.0069 seconds