Global ETD Search

Return to search

在高度分散式環境下進行Top-k相似文件檢索 / Similar Top-k documents retrieval in highly distributed environments

在文件資料庫的查詢處理上，Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中，檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度，選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫，因其覆蓋性和可擴充性的不足，使得這種排名傾向的文件查詢處理，需耗費大量時間及運算成本。近年來，使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢，但在高度分散式環境下，支援排名傾向的相似文件查詢是困難的，因為缺乏全域資訊和適當的系統協調者。
在本研究中，我們先針對各節點資料庫作分群前處理，並提出一個利用區域切割的作法[1]，將P2P環境劃分成數個子區塊後，建立特徵索引表。因此在查詢處理時，可透過索引表加快挑選出Top-k相似群集的速度，並且確保有適當數量的回傳結果。最後在實驗中，我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較，在高度分散式環境下，我們的方法在執行Top-k相似文件查詢時，會比上述兩種作法有較為優異的表現。 / On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty.
In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment.

http://thesis.lib.nccu.edu.tw/cgi-bin/cdrfb3/gsweb.cgi?o=dstdcdr&i=sid=%22G0099753034%22.

分散式環境

Tok-k 相似文件檢索

端對端網路

distributed environments

Identifer	oai:union.ndltd.org:CHENGCHI/G0099753034
Creators	王俊閎, Wang, Chun Hung
Publisher	國立政治大學
Source Sets	National Chengchi University Libraries
Language	中文
Detected Language	English
Type	text
Rights	Copyright © nccu library on behalf of the copyright holders

在高度分散式環境下進行Top-k相似文件檢索 / Similar Top-k documents retrieval in highly distributed environments

Description

Links & Downloads

Tags

Additional Fields