INTRODUCTION
Medical professionals seek to capture papers which can be located via keyword or free text search in digital libraries or on the web but are also interested in finding material that has not yet been indexed in on-line databases. Search engines provide a multitude of results [1]. Social bookmarking, where users tag items for their own use, offers a way to locate new and relevant information. CiteULike (citeulike.org), a social bookmarking service, allows articles to be tagged with useful keywords for later retrieval.
RELATED STUDIES
A previous study [2] compared social bookmarking to existing information organisation structures and found similarities in terminology use and intriguing differences. A sample of articles tagged on CiteULike was examined for contextual differences in keyword usage between users of social bookmarking sites, authors and indexers. Many tags were related to thesaurus terms (descriptors), but were not formally in the thesaurus. [2]
This study examines how term usage patterns in tags, keywords and descriptors suggest a similar (or differing) context between users, authors and intermediaries.
METHODOLOGY
This study examines the use of tags on CiteULike from three medical or biology journals (JAMA, Proteins, and Journal of Molecular Biology) indexed in Pubmed. 1299 unique articles were retrieved from Citeulike; Medical Subject Headings (MeSH) were collected from Pubmed. Articles were analysed using standard informetric techniques to examine the use of user assigned tags and their Pubmed assigned MeSH index terms. Data was analysed for term usage and categorised to see what contextual clues users expose in their tag use.
RESULTS
Articles were tagged by up to 14 users (average 2-4). 1449 unique tags were used in the data set. Some articles were heavily tagged by users (max. 29, min. 1, median 2). Descriptors were more heavily assigned to articles (2746 unique descriptors). Articles had, on average, 10 descriptors assigned (max. 40, min. 2).
Some tags occurred frequently: protein_structure (140), no-tag (134), and protein (114). By journal, tags were: docking (Proteins, 85), no-tag (JAMA, 20), and protein_structure (J Mol Biol, 52). No-tag (system assigned) indicated no tag assigned.
Descriptors were more heavily reused than tags, for example: 'Models, Molecular' (550), Protein Conformation (363), and Humans (341). By journal, descriptors were: 'Models, Molecular' (Proteins, 252), 'Models, Molecular' (J Mol Biol, 235), and Humans (JAMA, 137).
DISCUSSIONS AND CONCLUSIONS
Comparison of tag and descriptor lists shows many of the same similarities and differences as the previous study [2]. Many user terms were related to the author and intermediary terms but not in the thesaurus (e.g. 'diet' and 'fat' used separately in the tag lists where they were linked as 'dietary fats' in the thesaurus). Terms such as 'human' and 'family-studies' show users tagging biology articles are interested in methodology and user groups associated with articles.
This study has system design implications for accessing, indexing and searching document spaces. Users express frustration trying to narrow search results. Controlled vocabularies help narrow a search to a manageable size but can be expensive. User tagging could provide additional access points to traditional controlled vocabularies and the associative classifications necessary to tie documents and articles to time and task relationships among other novel items.
REFERENCES
[1] Tang H, Ng J.HK. 2006. Googling for a diagnosis -- use of Google as a diagnostic aid: internet based study. BMJ 333 (2 Dec), 1143-1145.
[2] Kipp MEI. 2006. Complementary or discrete contexts in online indexing: A comparison of user, creator, and intermediary keywords. Canadian Journal of Information and Library Science (in press) http://dlist.sir.arizona.edu/1533/
Identifer | oai:union.ndltd.org:arizona.edu/oai:arizona.openrepository.com:10150/106337 |
Date | January 2007 |
Creators | Kipp, Margaret E. I. |
Source Sets | University of Arizona |
Language | English |
Detected Language | English |
Type | Conference Poster |
Page generated in 0.0017 seconds