Since the Web is getting bigger and bigger with a rapidly increasing number of heterogeneous Web pages, Web users often suffer from two problems: P1) irrelevant information and P2) information overload Irrelevant information indicates the weak relevance between the retrieved information and a user's information need. Information overload indicates that the retrieved information may contain 1) redundant information (e.g., common information between two retrieved Web pages) or 2) too much amount of information which cannot be easily understood by a user. We consider four major causes of those two problems P1) and P2) as follows; ??? Firstly, ambiguous query-terms. ??? Secondly, ambiguous terms in a Web page. ??? Thirdly, a query and a Web page cannot be semantically matched, because of the first and second causes. ??? Fourthly, the whole content of a Web page is a coarse context-boundary to measure the similarity between the Web page and a query. To answer those two problems P1) and P2), we consider that the meanings of words in a Web page and a query are primitive hints for understanding the related semantics of the Web page. Thus, in this dissertation, we developed three cooperative technologies: Word Sense Based Web Information Retrieval (WSBWIR), Subjective Segment Importance Model (SSIM) and Topic Focused Web Page Summarization (TFWPS). ??? WSBWIR allows for a user to 1) describe their information needs at senselevel and 2) provides one way for users to conceptually explore information existing within Web pages. ??? SSIM discovers a semantic structure of a Web page. A semantic structure respects not only Web page authors logical presentation structures but also a user specific topic interests on the Web pages at query time. ??? TFWPS dynamically generates extractive summaries respecting a user's topic interests. WSBWIR, SSIM and TFWPS technologies are implemented and experimented through several case-studies, classification and clustering tasks. Our experiments demonstrated that 1) the comparable effectiveness of exploration of Web pages using word senses, and 2) the segments partitioned by SSIM and summaries generated by TFWPS can provide more topically coherent features for classification and clustering purposes.
Identifer | oai:union.ndltd.org:ADTP/257198 |
Date | January 2007 |
Creators | Yoo, Seung Yeol, Computer Science & Engineering, Faculty of Engineering, UNSW |
Publisher | Awarded by:University of New South Wales. Computer Science and Engineering |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Copyright Seung Yeol Yoo, http://unsworks.unsw.edu.au/copyright |
Page generated in 0.0024 seconds