Spelling suggestions: "subject:"forminformation retrieval"" "subject:"forminformation etrieval""
411 |
Social bookmarking in the enterpriseBraly, Michael D, Froh, Geoffrey B January 2006 (has links)
In this practitioner-oriented overview of a pilot project at a medium-sized software company, we outline the early phases of an effort to implement a Social Bookmarking System (SBS) within an enterprise. In particular, we discuss some of the unexpected challenges encountered with regards to potential user adoption, and the design strategy we used to address those challenges.
1. Introduction: Findability in the enterprise intranet has become an increasing critical issue with the growth in size and complexity of corporate information environments. To date, much of the solution space has focused on approaches such as the construction of rich, domain specific taxonomies and the development of sophisticated full-text search algorithms [1]. These methods can be extremely expensive and require careful ongoing maintenance to succeed. While they have proved valuable, some organizations, are beginning to seek out new innovations [2].
Social Bookmarking Systems (SBS) are a class of collaborative applications that allow users to save, access, share and describe shortcuts to web resources. Initially conceived as personal information management tools, they were designed to function as centralized storage repositories to simplify the collection of bookmarks for users who browse the Internet with more than one machine in different locations. Later, systems such as the now archetypical del.icio.us [3] added two key features: 1) description of bookmarks with arbitrary free keywords (â taggingâ ), and 2) sharing of bookmarks and tags across users.
We decided to undertake a small pilot project within our own enterprise to determine whether an SBS might aid in refindability, term extraction, and identification of communities of practice. Recent technology experiments such as IBMâ s Dogear [4] have suggested some promise for del.icio.us-style systems inside the corporate firewall.
2. Assessing User Readiness: One of the attractive features of social software is that they tend to be inexpensive to implement from a technical standpoint. However, because their success relies entirely on user participation, the organizational cost can be quite high. Therefore, instead of moving directly into implementation, we first conducted a user survey and series of interviews to both validate the deficiency in existing information retrieval mechanisms and gauge the receptivity to bookmarking as a possible solution. Rather than definitive data about user attitudes towards tagging, we found it difficult to elicit constructive feedback because most users â even those familiar with existing systems such as del.icio.us â did not fundamentally understand core social bookmarking concepts.
3. Communicating Concepts to Users: Based on our initial findings, we modified our project plan to focus efforts on user education. We employed a non-traditional design approach in which we identified the central features of an SBS, mapped those features to user activities, and then translated the activity scenarios into graphical comics. In architecting complex systems, comics can more effectively communicate concepts by abstracting away technical details such as the user interface. [5]
4. Future Work and Implications: This education strategy is incorporated into the roadmap for the future phases of the project that also includes milestones related to technical extensibility, data collection, and internal marketing to drive usage.
We believe that the most critical aspect of implementing social classification within an enterprise context may be preparing users to both understand and embrace tagging as a conceptual framework.
|
412 |
The freshness of Web search engine databasesLewandowski, Dirk, Wahlig, Henry, Meyer-Bautor, Gunnar January 2005 (has links)
This is a preprint of an article published in the Journal of Information Science Vol. 32, No. 2, 131-148 (2006). This study measures the frequency in which search engines update their indices. Therefore, 38 websites
that are updated on a daily basis were analysed within a time-span of six weeks. The analysed search
engines were Google, Yahoo and MSN. We find that Google performs best overall with the most pages
updated on a daily basis, but only MSN is able to update all pages within a time-span of less than 20 days.
Both other engines have outliers that are quite older. In terms of indexing patterns, we find different
approaches at the different engines: While MSN shows clear update patterns, Google shows some outliers
and the update process of the Yahoo index seems to be quite chaotic. Implications are that the quality of
different search engine indices varies and not only one engine should be used when searching for current
content.
|
413 |
Beyond Aboutness: Classifying Causal Links in the Service of InterdisciplinarityGnoli, Claudio, Szostak, Rick January 2009 (has links)
Most scholarship, and almost all interdisciplinary scholarship, involves the investigation of causal relationships among phenomena. Yet existing classification systems in widespread use have not focused on classifying works in terms of causal relationships. In order to allow all users interested in a particular causal link to readily find (only) all relevant works, it is necessary to develop a classification of phenomena such that each phenomenon occurs in only one place, and a classification of the sort of relationships that exist among phenomena. Such a classification would be of huge benefit to interdisciplinary scholars, and would also be useful for disciplinary scholars. In particular it will enhance the rate of discovery of "undiscovered knowledge".
|
414 |
A sentiment-based meta search engineNa, Jin-Cheon, Khoo, Christopher S.G., Chan, Syin January 2006 (has links)
This study is in the area of sentiment classification: classifying online review documents according to the overall sentiment expressed in them. This paper presents a prototype sentiment-based meta search engine that has been developed to perform sentiment categorization of Web search results. It assists users to quickly focus on recommended or non-recommended information by classifying Web search results into four categories: positive, negative, neutral, and non-review documents. It does this by using an automatic classifier based on a supervised machine learning algorithm, Support Vector Machine (SVM). This paper also discusses various issues we have encountered during the prototype development, and presents our approaches for resolving them. A user evaluation of the prototype was carried out with positive responses from users.
|
415 |
Multiple Presents: How Search Engines Re-write the PastHellsten, Iina, Leydesdorff, Loet, Wouters, Paul January 2006 (has links)
New Media & Society, 8(6), 2006 (forthcoming). / To be published in New Media & Society, 8(6), 2006 (forthcoming). Abstract: Internet search engines function in a present which changes continuously. The search engines update their indices regularly, overwriting Web pages with newer ones, adding new pages to the index, and losing older ones. Some search engines can be used to search for information at the internet for specific periods of time. However, these â date stampsâ are not determined by the first occurrence of the pages in the Web, but by the last date at which a page was updated or a new page was added, and the search engineâ s crawler updated this change in the database. This has major implications for the use of search engines in scholarly research as well as theoretical implications for the conceptions of time and temporality. We examine the interplay between the different updating frequencies by using AltaVista and Google for searches at different moments of time. Both the retrieval of the results and the structure of the retrieved information erodes over time.
|
416 |
Formulating Evaluation Measures for Structured Document Retrieval using Extended Structural RelevanceAli, Mir Sadek 06 December 2012 (has links)
Structured document retrieval (SDR) systems minimize the effort users spend to locate relevant information by retrieving sub-documents (i.e., parts of, as opposed to
entire, documents) to focus the user's attention on the relevant parts of a retrieved document. SDR search tasks are differentiated by the multiplicity of ways that users prefer to spend effort and gain relevant information in SDR. The sub-document retrieval paradigm has required researchers to undertake costly user studies to validate whether new IR measures, based on gain and effort, accurately capture IR performance.
We propose the Extended Structural Relevance (ESR) framework as a way, akin to classical set-based measures, to formulate SDR measures that share the common basis
of our proposed pillars of SDR evaluation: relevance, navigation and redundancy. Our
experimental results show how ESR provides a flexible way to formulate measures, and
addresses the challenge of testing measures across related search tasks by replacing costly user studies with low-cost simulation.
|
417 |
Evaluating Information Retrieval Systems With Multiple Non-Expert AssessorsLi, Le January 2013 (has links)
Many current test collections require the use of expert judgments during construction. The true label of each document is given by an expert assessor. However, the cost and effort associated with expert training and judging are typically quite high in the event where we have a high number of documents to judge. One way to address this issue is to have each document judged by multiple non-expert assessors at a lower expense. However, there are two key factors that can make this method difficult: the variability across assessors' judging abilities, and the aggregation of the noisy labels into one single consensus label. Much previous work has shown how to utilize this method to replace expert labels in the relevance evaluation. However, the effects of relevance judgment errors on the ranking system evaluation have been less explored.
This thesis mainly investigates how to best evaluate information retrieval systems with noisy labels, where no ground-truth labels are provided, and where each document may receive multiple noisy labels. Based on our simulation results on two datasets, we find that conservative assessors that tend to label incoming documents as non-relevant are preferable. And there are two important factors affect the overall conservativeness of the consensus labels: the assessor's conservativeness and the relevance standard. This important observation essentially provides a guideline on what kind of consensus algorithms or assessors are needed in order to preserve the high correlation with expert labels in ranking system evaluation. Also, we systematically investigate how to find the consensus labels for those documents with equal confidence to be either relevant or non-relevant. We investigate a content-based consensus algorithm which links the noisy labels with document content. We compare it against the state-of-art consensus algorithms, and find that, depending on the document collection, this content-based approach may help or hurt the performance.
|
418 |
An n-gram Based Approach to the Automatic Classification of Web Pages by GenreMason, Jane E. 10 December 2009 (has links)
The extraordinary growth in both the size and popularity of the World Wide Web has generated a growing interest in the identification of Web page genres, and in the use of these genres to classify Web pages. Web page genre classification is a potentially powerful tool for filtering the results of online searches. Although most information retrieval searches are topic-based, users are typically looking for a specific type of information with regard to a particular query, and genre can provide a complementary dimension along which to categorize Web pages. Web page genre classification could also aid in the automated summarization and indexing of Web pages, and in improving the automatic extraction of metadata.
The hypothesis of this thesis is that a byte n-gram representation of a Web page can be used effectively to classify the Web page by its genre(s). The goal of this thesis was to develop an approach to the problem of Web page genre classification that is effective not only on balanced, single-label corpora, but also on unbalanced and multi-label corpora, which better represent a real world environment. This thesis research develops n-gram representations for Web pages and Web page genres, and based on these representations, a new approach to the classification of Web pages by genre is developed.
The research includes an exhaustive examination of the questions associated with developing the new classification model, including the length, number, and type of the n-grams with which each Web page and Web page genre is represented, the method of computing the distance (dissimilarity) between two n-gram representations, and the feature selection method with which to choose these n-grams. The effect of preprocessing the data is also studied. Techniques for setting genre thresholds in order to allow a Web page to belong to more than one genre, or to no genre at all are also investigated, and a comparison of the classification performance of the new classification model with that of the popular support vector machine approach is made. Experiments are also conducted on highly unbalanced corpora, both with and without the inclusion of noise Web pages.
|
419 |
MEDICINFOSYS: AN ARCHITECTURE FOR AN EVIDENCE-BASED MEDICAL INFORMATION RESEARCH AND DELIVERY SYSTEMEdwards, Pif 03 August 2010 (has links)
Due to the complicated nature of medical information needs, the time constraints of clinicians, and the linguistic complexities and sheer volume of medical information, most medical questions go unanswered. It has been shown that nearly all of these questions can be answered with the presently available medical sources and that when these questions get answered, patient health benefits.
In this work, we design and describe a framework for Evidence-Based medical information research and delivery, MedicInfoSys. This system leverages the strengths of knowledge-based workers and of mature knowledge-based technologies within the medical domain. The most critical element of this framework, is a search interface, PifMed. PifMed uses gold-standard MeSH categorization (presently integrated into medline) as the basis of a navigational structure, which allows users to browse search results with an interactive tree of categories. Evaluation by user study shows it to be superior to PubMed, in terms of speed and usability.
|
420 |
Automated Story-based Commentary for SportsLee, Gregory M. K. Unknown Date
No description available.
|
Page generated in 0.094 seconds