Global ETD Search

1	文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究 / A study of relative text-distance-based kNN clustering technique and news events detection and tracking 陳柏均, Chen, Po Chun Unknown Date (has links) 新聞事件可描述為「一個時間區間內、同一主題的相似新聞之集合」，而新聞大多僅是一完整事件的零碎片段，其內容也易受到媒體立場或撰寫角度不同有所差異；除此之外，龐大的新聞量亦使得想要瞭解事件全貌的困難度大增。因此，本研究將利用文字探勘技術群聚相關新聞為事件，以增進新聞所帶來的價值。分類分群為文字探勘中很常見的步驟，亦是本研究將新聞群聚成事件所運用到的主要方法。最近鄰 (k-nearest neighbor, kNN)搜尋法可視為分類法中最常見的演算法之一，但由於kNN在分類上必須要每篇新聞兩兩比較並排序才得以選出最近鄰，這也產生了kNN在實作上的效能瓶頸。本研究提出了一個「建立距離參考基準點」的方法RTD-based kNN (Relative Text-Distance-based kNN)，透過在向量空間中建立一個基準點，讓所有文件利用與基準點的相對距離建立起遠近的關係，使得在選取前k個最近鄰之前，直接以相對關係篩選出較可能的候選文件，進而選出前k個最近鄰，透過相對距離的概念減少比較次數以改善效率。本研究於Google News中抽取62個事件(共742篇新聞)，並依其分群結果作為測試與評估依據，以比較RTD-based kNN與kNN新聞事件分群時的績效。實驗結果呈現出RTD-based kNN的基準點以常用字字彙建立較佳，分群後的再合併則有助於改善結果，而在RTD-based kNN與kNN的F-measure並無顯著差距(α=0.05)的情況下，RTD-based kNN的運算時間低於kNN達28.13%。顯示RTD-based kNN能提供新聞事件分群時一個更好的方法。最後，本研究提供一些未來研究之方向。 / News Events can be described as "the aggregation of many similar news that describe the particular incident within a specific timeframe". Most of news article portraits only a part of a passage, and many of the content are bias because of different media standpoint or different viewpoint of reporters; in addition, the massive news source increases complexity of the incident. Therefore, this research paper employs Text Mining Technique to cluster similar news to a events that can value added a news contributed. Classification and Clustering technique is a frequently used in Text Mining, and K-nearest neighbor(kNN) is one of most common algorithms apply in classification. However, kNN requires massive comparison on each individual article, and it becomes the performance bottlenecks of kNN. This research proposed Relative Text-Distance-based kNN(RTD-based kNN), the core concept of this method is establish a Base, a distance reference point, through a Vector Space, all documents can create the distance relationship through the relative distance between itself and base. Through the concept of relative distance, it can decrease the number of comparison and improve the efficiency. This research chooses a sample of 62 events (with total of 742 news articles) from Google News for the test and evaluation. Under the condition of RTD-based kNN and kNN with a no significant difference in F-measure (α=0.05), RTD-based kNN out perform kNN in time decreased by 28.13%. This confirms RTD-based kNN is a better method in clustering news event. At last, this research provides some of the research aspect for the future. 文字探勘 kNN 事件偵測與追蹤分類分群 Text Mining kNN Events Detection and Tracking Classification and Clustering
2	整合文件探勘與類神經網路預測模型之研究 -以財經事件線索預測台灣股市為例歐智民 Unknown Date (has links) 隨著全球化與資訊科技之進步，大幅加快媒體傳播訊息之速度，使得與股票市場相關之新聞事件，無論在產量、產出頻率上，都較以往增加，進而對股票市場造成影響。現今投資者多已具備傳統的投資概念、觀察總體經濟之趨勢與指標、分析漲跌之圖表用以預測股票收盤價；除此之外，從大量新聞資料中，找出關鍵輔助投資之新聞事件更是需要培養的能力，而此正是投資者較為不熟悉的部分，故希望透過本文加以探討之。　　本研究使用2009年自由時報電子報之財經新聞（共5767篇）為資料來源，以文件距離為基礎之kNN技術分群，並採用時間區間之概念，用以增進分群之時效性；而分群之結果，再透過類別詞庫分類為正向、持平及負向新聞事件，與股票市場之量化資料，包括成交量、收盤價及3日收盤價，一併輸入於倒傳遞類神經網路之預測模型。自台灣經濟新報中取得半導體類股之交易資訊，將其分成訓練及測試資料，各包含168個及83個交易日，經由網路之迭代學習過程建立預測模型，並與原預測模型進行比較。　　由研究結果中，首先，類別詞庫可透過股票收盤價報酬率及篩選字詞出現頻率的方式建立，使投資者能透藉由分群與分類降低新聞文件的資訊量；其次，於倒傳遞類神經網路預測模型中加入分類後的新聞事件，依統計顯著性檢定，在顯著水準為95%及99%下，皆顯著改善隔日股票收盤價之預測方向正確性與準確率，換言之，於預測模型中加入新聞事件，有助於預測隔日收盤價。最後，本研究並指出一些未來研究方向。事件偵測與追蹤 kNN分群倒傳遞類神經網路預測模型

Search results

文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究 / A study of relative text-distance-based kNN clustering technique and news events detection and tracking

整合文件探勘與類神經網路預測模型之研究 -以財經事件線索預測台灣股市為例