Global ETD Search

1	在高度分散式環境下進行Top-k相似文件檢索 / Similar Top-k documents retrieval in highly distributed environments 王俊閎, Wang, Chun Hung Unknown Date (has links) 在文件資料庫的查詢處理上，Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中，檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度，選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫，因其覆蓋性和可擴充性的不足，使得這種排名傾向的文件查詢處理，需耗費大量時間及運算成本。近年來，使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢，但在高度分散式環境下，支援排名傾向的相似文件查詢是困難的，因為缺乏全域資訊和適當的系統協調者。在本研究中，我們先針對各節點資料庫作分群前處理，並提出一個利用區域切割的作法[1]，將P2P環境劃分成數個子區塊後，建立特徵索引表。因此在查詢處理時，可透過索引表加快挑選出Top-k相似群集的速度，並且確保有適當數量的回傳結果。最後在實驗中，我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較，在高度分散式環境下，我們的方法在執行Top-k相似文件查詢時，會比上述兩種作法有較為優異的表現。 / On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty. In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment. 分散式環境 Tok-k 相似文件檢索端對端網路 distributed environments similar top-k documents retrieval peer-to-peer network

Search results

在高度分散式環境下進行Top-k相似文件檢索 / Similar Top-k documents retrieval in highly distributed environments