Spelling suggestions: "subject:"citeseer"" "subject:"citeceer""
1 |
Scalability of Stepping Stones and PathwaysVenkatachalam, Logambigai 30 May 2008 (has links)
Information Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. "Search" is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP "proof-of-concept" implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents. / Master of Science
|
2 |
Annotation de documents par le contexte de citation basée sur une ontologieAbrouk, Lylia 27 November 2006 (has links) (PDF)
Cette thèse présente une approche et des outils pour l'annotation de documents en se basant sur des ontologies. Dans notre contexte, ceci se traduit par des documents annotés par un ensemble de concepts clés issus de l'ontologie du domaine. Nous traitons le problème de l'annotation en développant une approche basée sur la relation de citation. Cette relation constitue la base d'une méthode pour affiner la propagation des annotations entre les documents. L'approche est indépendante du contenu et utilise un regroupement thématique des références construit à partir d'une classification floue non-supervisée. L'annotation étant basée sur l'utilisation d'ontologies, nous avons également abordé le problème de l'enrichissement de l'ontologie afin de pouvoir prendre en compte les différentes évolutions des documents et affiner la phase d'annotation. Un outil, nommé RAS, Reference Annotation System, a été développé et des expérimentations ont été réalisées en utilisant la base Citeseer.
|
Page generated in 0.0367 seconds