• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

在高度分散式環境下進行Top-k相似文件檢索 / Similar Top-k documents retrieval in highly distributed environments

王俊閎, Wang, Chun Hung Unknown Date (has links)
在文件資料庫的查詢處理上,Top-k相似文件查詢主要是協助使用者可以從龐大的文件集合中,檢索出和查詢文件具有高度相關性的文件集合。將資料庫內的文件依據和查詢文件之相似度程度,選擇出相似度最高的前k篇文件回傳給使用者。然而過去集中式資料庫,因其覆蓋性和可擴充性的不足,使得這種排名傾向的文件查詢處理,需耗費大量時間及運算成本。近年來,使用端對端(Peer-to-peer, P2P)架構解決相關的文件檢索問題已成為一種趨勢,但在高度分散式環境下,支援排名傾向的相似文件查詢是困難的,因為缺乏全域資訊和適當的系統協調者。 在本研究中,我們先針對各節點資料庫作分群前處理,並提出一個利用區域切割的作法[1],將P2P環境劃分成數個子區塊後,建立特徵索引表。因此在查詢處理時,可透過索引表加快挑選出Top-k相似群集的速度,並且確保有適當數量的回傳結果。最後在實驗中,我們提出的方法會與傳統集中式搜尋引擎以及SON-based [1] 做比較,在高度分散式環境下,我們的方法在執行Top-k相似文件查詢時,會比上述兩種作法有較為優異的表現。 / On query processing in a large database, similar top-k documents query is an important mechanism to retrieve the highly correlated document collection with query for users. It ranks documents with a similarity ranking function and reports the k documents with highest similarity. However, the former approach in web searching, i.e., centralized search engines, rises some issues such as lack of coverage and scalability, impact provides rank-based query become a costly operation. Recently, using Peer-to-peer (P2P) architectures to tackle above issues has emerged as a trend of solution, but due to the shortage of global knowledge and some appropriate central coordinators, support rank-based query in highly distributed environment has been difficulty. In this paper, we proposed a framework to solve these problems. First, we performed the local cluster pre-processing on each peer, followed by the zone creation process, forming sub-zones over P2P network, and then constructing the feature index table to improve the performance of selecting similar top-k cluster results. The experiments show that our approach performs similar top-k documents query outperforms than SON-based approach in highly distributed environment.

Page generated in 0.0274 seconds