Spelling suggestions: "subject:"document filtering"" "subject:"ocument filtering""
1 |
litsift: Automated Text Categorization in Bibliographic SearchFaulstich, Lukas C., Stadler, Peter F., Thurner, Caroline, Witwer, Christina 07 January 2019 (has links)
In bioinformatics there exist research topics that cannot be uniquely characterized by a set of key words because relevant key words are (i) also heavily used in other contexts and (ii) often omitted in relevant documents because the context is clear to the target audience. Information retrieval interfaces such as entrez/Pubmed produce either low precision or low recall in this case. To yield a high recall at a reasonable precision, the results of a broad information retrieval search have to be filtered to remove irrelevant documents. We use automated text categorization for this purpose. In this study we use the topic of conserved secondary RNA structures in viral genomes as running example. Pubmed result sets for two virus groups, Picornaviridae and Flaviviridae, have been manually labeled by human experts. We
evaluated various classifiers from the Weka toolkit together with different feature selection methods to assess whether classifiers trained on documents dedicated to one virus group can be successfully applied to filter literature on other virus groups. Our results indicate that in this domain a bibliographic search tool trained on a reference corpus may significantly reduce the amount of time needed for extensive literature recherches.
|
2 |
Inducing Conceptual User ModelsMüller, Martin Eric 29 April 2002 (has links)
User Modeling and Machine Learning for User Modeling have both become important research topics and key techniques in recent adaptive systems. One of the most intriguing problems in the `information age´ is how to filter relevant information from the huge amount of available data. This problem is tackled by using models of the user´s interest in order to increase precision and discriminate interesting information from un-interesting data. However, any user modeling approach suffers from several major drawbacks: User models built by the system need to be inspectable and understandable by the user himself. Secondly, users in general are not willing to give feedback concerning user satisfaction by the delivered results. Without any evidence for the user´s interest, it is hard to induce a hypothetical user model at all. Finally, most current systems do not draw a line of distinction between domain knowledge and user model which makes the adequacy of a user model hard to determine. This thesis presents the novel approach of conceptual user models. Conceptual user models are easy to inspect and understand and allow for the system to explain its actions to the user. It is shown, that ILP can be applied for the task of inducing user models from feedback, and a method for using mutual feedback for sample enlargement is introduced. Results are evaluated independently of domain knowledge within a clear machine learning problem definition. The whole concept presented is realized in a meta web search engine called OySTER.
|
Page generated in 0.0997 seconds