隨著全球資訊網的發展,網站吸引了大量的使用者.分析網站中大部分使用者共同的網站瀏覽行為,不但有助於網站結構的設計與更新,也可以對具有相同瀏覽行為的使用者,做有效的個人化服務。
目前有關使用者網站瀏覽行為的研究,所探勘出來的結果多為路徑瀏覽式樣或是網頁循序式樣。因此,我們提出一種新的網站瀏覽式樣:網站漫游,並且提出了兩個演算法AM與PM,來探勘出頻繁使用者網站漫遊行為式樣。
演算法AM是針對要處理的資料量非常龐大,而無法將全部資料存放入主記憶體中的情形所設計的。AM是利用演算法Apriori的精神,來探勘出頻繁使用者網站漫游行為。而演算法PM是針對資料經過轉換後可存放入主記憶體的情形而設計的。PM主要是利用在主記憶體中建立一個樹狀結構,以進一步來壓縮原有資料庫內的大量資料,並利用這個樹狀資料結構來逐步探勘出所有的使用者頻繁網站漫游。在實驗的假設條件下,演算法AM與PM皆展現了線性的執行效率與延展性。 / With progressive expansion in the size and complexity of web site on the World Wide Web, much research has been done on the discovery of useful and interesting Web traversal patterns.
Most existing approaches focus on mining of path traversal patterns or sequential patterns. In this paper, we present a new pattern, Web traversal walks, for mining of Web traversal pattern. A Web traversal walk is the complete trail of a user traversal behavior in a single Web site. Web traversal walk mining is more helpful to understand and predict the behavior of the Web site access patterns.
Two efficient algorithms (i.e., AM and PM) are proposed to discover the Web traversal walks. The algorithm PM is used when the size of database is fit in main memory while AM is not. AM is developed based on the Apriori property to discover all the frequent Web traversal walks from Web logs. In the algorithm PM, a tree structure is constructed in memory from Web logs and the frequent Web traversal walks are generated from the tree structure. Experimental results show that the proposed methods perform well in efficiency and scalability.
Identifer | oai:union.ndltd.org:CHENGCHI/A2002001570 |
Creators | 李華富, Lee, Hua-Fu |
Publisher | 國立政治大學 |
Source Sets | National Chengchi University Libraries |
Language | 英文 |
Detected Language | English |
Type | text |
Rights | Copyright © nccu library on behalf of the copyright holders |
Page generated in 0.0019 seconds