Topic management is the task of gathering, evaluating, organizing, and sharing a set of web sites for a specific topic. Current web tools do not provide adequate support for this task. We created and continue to develop the TopicShop system to address this need. TopicShop includes (1) a web crawler/analyzer that discovers relevant web sites and builds site profiles, and (2) user interfaces for information workspaces. We conducted an empirical pilot study comparing user performance with TopicShop vs. Yahooï . Results from this study were used to improve the design of TopicShop. A number of key design changes were incorporated into a second version of TopicShop based on results and user comments of the pilot study including (1) the tasks of evaluation and organization are treated as integral instead of separable, (2) spatial organization is important to users and must be well supported in the interface, and (3) distinct user and global datasets help users deal with the large quantity of information available on the web. A full empirical study using the second iteration of TopicShop covered more areas of the World Wide Web and validated results from the pilot study. Across the two studies, TopicShop subjects found over 80% more high-quality sites (where quality was determined by independent expert judgements) while browsing only 81% as many sites and completing their task in 89% of the time. The site profile data that TopicShop provide -- in particular, the number of pages on a site and the number of other sites that link to it -- were the key to these results, as users exploited them to identify the most promising sites quickly and easily. We also evaluated a number of link- and content-based algorithms using a dataset of web documents rated for quality by human topic experts. Link-based metrics did a good job of picking out high-quality items. Precision at 5 (the common information retrieval metric indicating the percentage of high quality items selected that are actually high quality) is about 0.75, and precision at 10 is about 0.55; this is in a dataset where 32% of all documents were of high quality. Surprisingly, a simple content-based metric, which ranked documents by the total number of pages on their containing site, performed nearly as well. These studies give insight into users' needs for the task of topic management, and provide empirical evidence of the effectiveness of task-specific interfaces (such as TopicShop) for managing topical collections. / Ph. D.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/29871 |
Date | 15 December 2003 |
Creators | Amento, Brian |
Contributors | Computer Science, Hix, Deborah S., Schulman, Robert S., Hartson, H. Rex, Terveen, Loren, Ehrich, Roger W. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Dissertation |
Format | application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Relation | amento-dissertation-final.pdf |
Page generated in 0.0021 seconds