Global ETD Search

1	Using semantics to enhance query reformulation in dynamic distributed environments Fernandes, Damires Yluska de Souza 31 January 2009 (has links) Made available in DSpace on 2014-06-12T15:49:34Z (GMT). No. of bitstreams: 1 license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2009 / O processamento de consultas tem sido abordado como um problema central em ambientes dinâmicos e distribuídos. O ponto crítico do processamento, no entanto, é a reformulação da consulta submetida em um ponto origem em termos de um ponto destino, considerando as correspondências existentes entre eles. Abordagens tradicionais, em geral, realizam a reformulação utilizando correspondências de equivalência. Entretanto, nem sempre conceitos de um ponto origem têm correspondentes equivalentes no ponto destino, o que pode gerar uma reformulação vazia e, possivelmente, nenhuma resposta para o usuário. Neste caso, se o usuário considera interessante receber respostas relacionadas, mesmo que não precisas, é melhor gerar uma reformulação adaptada ou enriquecida e, por consequência, respostas aproximadas, do que nenhuma. Dentro deste escopo, o presente trabalho propõe um enfoque baseado em semântica, denominado SemRef, que visa integrar técnicas de enriquecimento e reformulação de consultas de forma a prover usuários com um conjunto de respostas expandidas. Reformulações exatas e enriquecidas são produzidas para permitir alcançar esse conjunto. Para tal, usamos semântica obtida principalmente de um conjunto de correspondências semânticas que estendem as normalmente encontradas na literatura. Exemplos de correspondências não usuais são closeness e disjointness. Além disso, usamos o contexto do usuário, da consulta e do ambiente como meio de favorecer o processo de reformulação e lidar com informações que somente são obtidas dinamicamente. Formalizamos as definições propostas através da Lógica Descritiva ALC e apresentamos o algoritmo que compõe o enfoque proposto, garantindo, através de propriedades aferidas, sua corretude e completude. Desenvolvemos o algoritmo SemRef através de um módulo de submissão e execução de consultas em um Sistema de gerenciamento de dados em ambiente P2P (PDMS). Mostramos exemplos que illustram o funcionamento e as vantagens do trabalho desenvolvido. Por fim, apresentamos a experimentação realizada com os resultados que foram obtidos Query Reformulation Semantics Context Dynamic Distributed Environments
2	Query Expansion : en jämförande studie av Automatisk Query Expansion med och utan relevans-feedback / Query Expansion : a comparative study of Automatic Query Expansion with and without relevance feedback Ekberg-Selander, Karin, Enberg, Johanna January 2007 (has links) In query expansion (QE) terms are added to an initial query in order to improve retrieval effectiveness. In this thesis we use QE in the sense that a reformulation of the query is done by deleting the terms in the initial query and instead replacing them with terms from the documents retrieved in the initial run. The aim of this thesis is to, in a experimental full text invironment, study and compare the retrieval result of two different query expansion strategies in relation to each other. The following questions are addressed by the study: How do the two strategies perform in relation to each other regarding recall?What may be causing the result?Are the two strategies retrieving the same relevant documents?Two strategies are designed to simulate a searcher using automatic query expansion (AQE) either with or without relevance feedback. Strategy I is simulating AQE without relevance feedback by taking the top five documents that are retrieved in the initial run and then extracting the top ten most frequently occurring terms in these to create a new query. Correspondingly the Strategy II, is simulating AQE with relevance feedback by taking the top five relevant documents and extracting the top ten terms in these to create a new query. It is concluded that both of the strategies’ retrieval performance was improved for most of the topics. In average Strategy II did achieve 54.63 percent recall compared to Strategy I which did achieve 45.59 percent recall. The two strategies did retrieve different relevant documents for majority of the topics. Hence, it would be reasonable to base a system on both of them. / Uppsatsnivå: D query expansion query reformulation relevance feedback inquery återvinningseffektivitet information retrieval Social Sciences Samhällsvetenskap
3	Relevance feedback-based optimization of search queries for Patents Cheng, Sijin January 2019 (has links) In this project, we design a search query optimization system based on the user’s relevance feedback by generating customized query strings for existing patent alerts. Firstly, the Rocchio algorithm is used to generate a search string by analyzing the characteristics of related patents and unrelated patents. Then the collaborative filtering recommendation algorithm is used to rank the query results, which considering the previous relevance feedback and patent features, instead of only considering the similarity between query and patents as the traditional method. In order to further explore the performance of the optimization system, we design and conduct a series of evaluation experiments regarding TF-IDF as a baseline method. Experiments show that, with the use of generated search strings, the proportion of unrelated patents in search results is significantly reduced over time. In 4 months, the precision of the retrieved results is optimized from 53.5% to 72%. What’s more, the rank performance of the method we proposed is better than the baseline method. In terms of precision, top10 of recommendation algorithm is about 5 percentage points higher than the baseline method, and top20 is about 7.5% higher. It can be concluded that the approach we proposed can effectively optimize patent search results by learning relevance feedback. Patent Search Query Reformulation Recommendation System Matrix Decomposition Text Processing Computer Systems Datorsystem
4	Information Quality Criteria Analysis in Query Reformulation in Dynamic Distributed Environments SOUZA, Bruno Felipe de França 09 September 2013 (has links) Submitted by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-10T13:03:19Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação Bruno Souza.pdf: 3055649 bytes, checksum: 5cedaf83e4e87135a1f22f1bb7c1dd09 (MD5) / Made available in DSpace on 2015-03-10T13:03:19Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação Bruno Souza.pdf: 3055649 bytes, checksum: 5cedaf83e4e87135a1f22f1bb7c1dd09 (MD5) Previous issue date: 2013-09-09 / FACEPE / Ambientes dinâmicos e distribuídos são sistemas descentralizados que fornecem aos usuários recursos de consultas sobre um conjunto de fontes de dados heterogêneas, distribuídas e autônomas (peers). Sistemas de Integração de Dados, Peer Data Management System (PDMS) e Dataspaces são exemplos de tais sistemas. Eles são constituídos por peers que pertencem a um domínio específico e estão ligados entre si por meio de correspondências semânticas. No entanto, um desafio inerente em ambientes dinâmicos e distribuídos é o processo de reformulação de consulta entre um par de peers. Quando um usuário coloca uma consulta em um peer, a fim de adquirir mais informações, a consulta deve ser reformulada de acordo com o esquema dos peers vizinhos. Neste processo podem surgir alguns problemas como a perda semântica e a degradação da consulta. A perda semântica e degradação da consulta são problemas relacionados à perda de conceitos semânticos durante a reformulação. Por outro lado, em um ambiente semanticamente rico, ao invés de uma perda semântica, a consulta pode ter um enriquecimento semântico por meio da agregação de conceitos semanticamente relacionados durante a reformulação. Neste sentido, a consulta do usuário pode ser enriquecida e resultados semânticos mais ricos podem ser recuperados. Critérios de qualidade da informação têm sido usados em alguns trabalhos para avaliar o nível de qualidade dos elementos de um ambiente dinâmico e distribuído como, por exemplo, peers, dados e a resposta da consulta. Estes critérios são medidas dinâmicas proporcionadas pelo sistema e servem como uma pontuação que pode ser constantemente avaliada para obter o nível real de qualidade. Neste trabalho, apresentamos quatro critérios de qualidade da informação que medem a perda e o ganho de conceitos semânticos durante a reformulação da consulta entre os pares de peers. Nós apresentamos um exemplo da nossa abordagem e os algoritmos de avaliação de critérios. Também damos as nossas definições para os problemas de perda semântica e degradação da consulta. Por fim, apresentamos a experimentação que fizemos com o PDMS SPEED e os resultados obtidos. / Dynamic distributed environments are decentralized systems that provide users with querying capabilities over a set of heterogeneous, distributed and autonomous data sources (peers). Data Integration Systems, Peer Data Management Systems (PDMS) and Dataspaces are examples of such systems. They are composed by peers that belong to a specific domain and are linked to each other by correspondences (semantic connections). Nonetheless, a challenge inherent to dynamic distributed environments is the query reformulation process between a pair of peers. When a user poses a query at a peer, in order to acquire more information, the query should be reformulated in accordance with the neighbor peers schema. In this process some problems as semantic loss and query degradation can arise. The semantic loss and query degradation are problems related to the loss of semantic concepts during query reformulation. In the other hand, in such a semantic environment instead of a semantic loss the query can have a semantic enrichment by aggregating semantic related concepts during reformulation. In this sense, the user’s query can be enriched and semantically richer results can be delivered. Information Quality criteria has been used in some works to evaluate the level of quality of the distributed dynamic environment’s elements such as, peers, data and query answer. These criteria are dynamic measures provided by the system and serve as scores that can be constantly evaluated to get the actual level of quality. In this work we present four Information Quality criteria that measure the loss and enrichment of semantic concepts during query reformulation among peers. We present an example of our analysis and the algorithms that implement the evaluation of the presented criteria. We also give our definitions to the semantic loss and query degradation problems. Finally, we present the experimentation we have done with the SPEED PDMS and the obtained results. Query Reformulation Information Quality PDMS Distributed Dynamic Environment Semantic Correspondences Reformulação de Consultas Qualidade da Informação Ambiente Dinâmico e Distribuído Correspondências Semânticas
5	Reformulation sémantique des requêtes pour la recherche d’information ad hoc sur le Web / Sémantique query reformulation for ad hoc information retrieval on the Web Audeh, Bissan 09 September 2014 (has links) Dans le cadre d’une solution de modification de la requête, nous nous intéressons aux différentes façons d’utiliser la sémantique pour mieux exprimer le besoin d’information de l’utilisateur dans un contexte Web. Nous distinguons deux types de concepts : ceux identifiables dans une ressource sémantique comme une ontologie, et ceux que l’on extrait à partir d’un ensemble de documents de pseudo retour de pertinence. Nous proposons une Approche Sémantique Mixte d’Expansion et de Reformulation (ASMER) qui permet de modéliser l’utilisation de ces deux types de concepts dans une requête modifiée. Cette approche considère plusieurs défis liés à la modification automatique des requêtes, notamment le choix sélectif des termes d’expansion, le traitement des entités nommées et la reformulation de la requête finale.Bien que dans un contexte Web la précision soit le critère d’évaluation le plus adapté, nous avons aussi pris en compte le rappel pour étudier le comportement de notre approche sous plusieurs aspects. Ce choix a suscité une autre problématique liée à l’évaluation du rappel en recherche d’information. En constatant que les mesures précédentes ne répondent pas à nos contraintes, nous avons proposé la mesure MOR (Mesure Orientée Rappel), qui permet d’évaluer le rappel en tenant compte de la précision comme importante mais pas prioritaire dans un contexte dirigé rappel.En incluant MOR dans notre stratégie de test, nous avons évalué ASMER sur quatre collections Web issues des campagnes INEX et TREC. Nos expériences montrent qu’ASMER améliore la performance en précision par rapport aux requêtes originales et par rapport aux requêtes étendues par une méthode de l’état de l’art. / As a query expansion and reformulation solution, we are interested in the different ways the semantic could be used to translate users information need into a query. We define two types of concepts : those which we can identify in a semantic resource like an ontology, and the ones we extract from the collection of documents via pseudo relevance feedback procedure. We propose a semantic and mixed approach to query expansion and reformulation (ASMER) that allows to integrate these two types of concepts in an automatically modified query. Our approach considers many challenges, especially selective terms expansion, named entity treatment and query reformulation.Even though the precision is the evaluation criteria the most adapted to a web context, we also considered evaluating the recall to study the behavior of our model from different aspects. This choice led us to handle a different problem related to evaluating the recall in information retrieval. After realizing that actual measures don't satisfy our constraints, we proposed a new recall oriented measure (MOR) which considers the recall as a priority without ignoring the precision.Among other measures, MOR was considered to evaluate our approach ASMER on four web collection from the standard evaluation campaigns Inex and Trec. Our experiments showed that ASMER improves the precision of the non modified original queries. In most cases, our approach achieved statistically significant enhancements when compared to a state of the art query expansion method. In addition, ASMER retrieves the first relevant document in better ranks than the compared approaches, it also has slightly better recall according to the measure MOR. Recherche d'information Reformulation sémantique de la requête Retour de pertinence Ressources sémantiques Évaluation du rappel Information retrieval Semantic query reformulation Relevance feedback Semantic resources Recall evaluation
6	Approches hybrides pour la recherche sémantique de l'information : intégration des bases de connaissances et des ressources semi-structurées / Hybrid Approaches for Semantic Information Retrieval : Towards the Integration of Knowledge Bases and Semistructured Resources Mrabet, Yassine 12 July 2012 (has links) La recherche sémantique de l'information a connu un nouvel essor avec les nouvelles technologies du Web sémantique. Des langages standards permettent aujourd'hui aux logiciels de communiquer par le biais de données écrites dans le vocabulaire d'ontologies de domaine décrivant une sémantique explicite. Cet accès ``sémantique'' à l'information requiert la disponibilité de bases de connaissances décrivant les instances des ontologies de domaine. Cependant, ces bases de connaissances, bien que de plus en plus riches, contiennent relativement peu d'information par comparaison au volume des informations contenu dans les documents du Web.La recherche sémantique de l'information atteint ainsi certaines limites par comparaison à la recherche classique de l'information qui exploite plus largement ces documents. Ces limites se traduisent explicitement par l'absence d'instances de concepts et de relations dans les bases de connaissances construites à partir des documents du Web. Dans cette thèse nous étudions deux directions de recherche différentes afin de permettre de répondre à des requêtes sémantiques dans de tels cas. Notre première étude porte sur la reformulation des requêtes sémantiques des utilisateurs afin d'atteindre des parties de document pertinentes à la place des faits recherchés et manquants dans les bases de connaissances. La deuxième problématique que nous étudions est celle de l'enrichissement des bases de connaissances par des instances de relations.Nous proposons deux solutions pour ces problématiques en exploitant des documents semi-structurés annotés par des concepts ou des instances de concepts. Un des points clés de ces solutions est qu'elles permettent de découvrir des instances de relations sémantiques sans s'appuyer sur des régularités lexico-syntaxiques ou structurelles dans les documents. Nous situons ces deux approches dans la littérature et nous les évaluons avec plusieurs corpus réels extraits du Web. Les résultats obtenus sur des corpus de citations bibliographiques, des corpus d'appels à communication et des corpus géographiques montrent que ces solutions permettent effectivement de retrouver de nouvelles instances relations à partir de documents hétérogènes tout en contrôlant efficacement leur précision. / Semantic information retrieval has known a rapid development with the new Semantic Web technologies. With these technologies, software can exchange and use data that are written according to domain ontologies describing explicit semantics. This ``semantic'' information access requires the availability of knowledge bases describing both domain ontologies and their instances. The most often, these knowledge bases are constructed automatically by annotating document corpora. However, while these knowledge bases are getting bigger, they still contain much less information when comparing them with the HTML documents available on the surface Web.Thus, semantic information retrieval reaches some limits with respect to ``classic'' information retrieval which exploits these documents at a bigger scale. In practice, these limits consist in the lack of concept and relation instances in the knowledge bases constructed from the same Web documents. In this thesis, we study two research directions in order to answer semantic queries in such cases. The first direction consists in reformulating semantic user queries in order to reach relevant document parts instead of the required (and missing) facts. The second direction that we study is the automatic enrichment of knowledge bases with relation instances.We propose two novel solutions for each of these research directions by exploiting semi-structured documents annotated with concept instances. A key point of these solutions is that they don't require lexico-syntactic or structure regularities in the documents. We position these approaches with respect to the state of the art and experiment them on several real corpora extracted from the Web. The results obtained from bibliographic citations, call-for-papers and geographic corpora show that these solutions allow to retrieve new answers/relation instances from heterogeneous documents and rank them efficiently according to their precision. Ontologie Bases de connaissances Recherche sémantique d’information Enrichissement de bases de connaissances Reformulation de requêtes Document semi-structurés Ontologies Knowledge bases Semantic information retrieval Knowledge base enrichment Query reformulation Semi-structured documents
7	Εννοιολογικός προσανατολισμός της αναζήτησης στον Παγκόσμιο Ιστό Βεργέτη, Δανάη 09 October 2014 (has links) Tα τελευταία χρόνια, η εξάπλωση του διαδικτύου και το εύρος της πληροφορίας που διατίθεται στο χρήστη, καθιστούν αναγκαία τη χρησιμοποίηση σημασιολογικών τεχνικών προσωποποίησης, προκειμένου να βελτιώσουν την εμπειρία του χρήστη στο διαδίκτυο. Στις μηχανές αναζήτησης, οι χρήστες βελτιώνουν το επερώτημά τους με την προσθήκη, την αφαίρεση ή την αντικατάσταση των λέξεων. Παρ 'όλα αυτά , εκτός από την αλληλεπίδραση με μια μηχανή αναζήτησης, η εμπειρία ενός χρήστη στο διαδίκτυο κατά την αναζήτηση της σωστής πληροφορίας, περιλαμβάνει και την περιήγησή του σε σελίδες ενός δικτυακού τόπου ή μια σειρά από δικτυακούς τόπους. Κατά τη διάρκεια της συνεδρίας του, ο χρήστης αναδιαμορφώνει την αναζήτησή του. Ωστόσο, τόσο ο καθορισμός της σημασιολογίας της αναζήτησής του, όσο και ο προσανατολισμός της αναζήτησής του (γενίκευση ή εξειδίκευση σε ένα σημασιολογικό πεδίο) με βάση την πλοήγηση μέσα από τις σελίδες, δεν είναι τόσο εύκολοι. Κάθε σελίδα περιέχει περισσότερες από μία έννοιες. Επιπλέον, η επιλογή των αντιπροσωπευτικότερων είναι πολύπλοκη διαδικασία. Σκοπός της παρούσας εργασίας είναι η παρουσίαση της μεθοδολογίας SOSACT. Η μεθοδολογία SOSACT αποτελεί μια σημασιολογική μεθοδολογία εξατομίκευσης που παρακολουθεί τις επιλογές του χρήστη κατά τη συνεδρία του και καθορίζει αν ο χρήστης ειδικεύει ή γενικεύει την πλοήγηση του μέσα από τη σημασιολογική ανάλυση των σελίδων, σε ένα εννοιολογικό πεδίο. Η μεθοδολογία SOSACT ορίζει το σημασιολογικό προσανατολισμό της πλοήγησης του χρήστη. Επιπλέον, στην παρούσα εργασία προτείνεται ο αλγόριθμος SOSACT, ο οποίος εντοπίζει το σημασιολογικό προσανατολισμό του χρήστη με τη βοήθεια μίας ταξινομίας. Η μεθοδολογία SOSACT υλοποιείται από το σύστημα SOSACT. Το σύστημα SOSACT εφαρμόζει τον αλγόριθμο SOSACT και προτείνει χρήσιμες συστάσεις προς το χρήστη για τη βελτίωση της διαδικτυακής αναζήτησής του . Το σύστημα SOSACT αξιολογήθηκε με τη χρησιμοποίηση πραγματικής δραστηριότητας χρηστών σε μια ιστοσελίδα, για ορισμένο χρονικό διάστημα. Η μεθοδολογία SOSACT μπορεί να εφαρμοστεί και σε ένα σώμα κειμένων και όχι μόνο σε διαδικτυακές πηγές. Μπορεί να γίνει ένα χρήσιμο εργαλείο για τη βελτίωση της πλοήγησης στο διαδίκτυο. Επιπλέον, η προτεινόμενη μεθοδολογία μπορεί να γεφυρώσει τις τεχνικές αποσαφήνισης του επερωτήματος στις μηχανές αναζήτησης και τις τεχνικές αναδιαμόρφωσης του αντικειμένου περιήγησης. Η μεθοδολογία SOSACT θα μπορούσε να χρησιμοποιηθεί σε μια συγκριτική μελέτη μεταξύ των δύο αυτών τομέων και να οδηγήσει σε νέες τεχνικές και στις δύο περιοχές έρευνας του Σημασιολογικού Ιστού. / In recent years, the spread of the World Wide Web, as well as the range of information available to the user make the use of semantic personalization techniques a necessity in order to enhance the user experience on the web. In search engines, users refine their query by adding, removing or replacing the keywords in their query. Thus, query refinement is easy to be detected and tell whether a user generalizes or specializes his web search. Nevertheless, besides interaction with a search engine, a user web search involves browsing and navigating through the pages of a web site or a number of web sites while seeking the right information. During this session the user reformulates his search. But, defining search orientation (generalization or specialization) based on navigation through web pages is not that easy. Each page contains more than one concept. Furthermore, the concepts may be developed in the same extend and it is difficult to tell about the representative semantics of a certain page and thus a user session’s orientation. In order to define user navigation’s orientation a semantic web personalization methodology is developed, the SOSACT methodology, which tracks user’s hits through a session and defines whether a user specializes or generalizes his navigation through semantics analysis of the pages in his session window. Moreover, the SOSACT algorithm is proposed of capturing user session orientation based on concept taxonomy. The SOSACT methodology is implemented by the SOSACT system. The SOSACT system applies the SOSACT algorithm and proposes useful recommendation to the user to improve his web search. The SOSACT system is evaluated on real user activity in a web site for a certain period of time. The experimental outcomes satisfied the prospective results. The SOSACT methodology could become a useful tool for navigation refinement. Furthermore, this work is proved to bridge search engine query refinement and browsing reformulation techniques. It could be a comparative study between these two fields and lead to new techniques in both areas or migration techniques between both areas. Ταξινομία Σημασιολογικός Ιστός Κυριαρχία εννοιών 025.042 5 Semantic orientation Taxonomy Semantic Web Query reformulation Semantic analysis Concept dominance SOSACT
8	Σημασιολογικές μηχανές αναζήτησης Παγκόσμιου Ιστού / Semantic web clustering engines Καναβός, Ανδρέας 11 June 2012 (has links) Οι μηχανές αναζήτησης είναι ένα ανεκτίμητο εργαλείο για την ανάκτηση πληροφοριών από το διαδίκτυο. Απαντώντας στα ερωτήματα του χρήστη, επιστρέφουν μια λίστα με αποτελέσματα, ταξινομημένα κατά σειρά, με βάση τη συνάφεια του περιεχομένου τους προς το ερώτημα. Ωστόσο, αν και οι μηχανές αναζήτησης είναι σίγουρα αρκετά καλές στην αναζήτηση συγκεκριμένων ερωτημάτων, όπως είναι η εύρεση μιας συγκεκριμένης ιστοσελίδας, αντίθετα μπορούν να είναι λιγότερο αποτελεσματικές όσον αφορά την αναζήτηση ασαφών, προς αυτές, ερωτημάτων, όπως για παράδειγμα όταν συναντούμε το φαινόμενο της αμφισημίας, όπου μια λέξη μπορεί να πάρει περισσότερες από μία έννοιες μέσα στα συμφραζόμενα διαφορετικής πρότασης. Άλλο ένα παράδειγμα ερωτήματος είναι όταν υπάρχουν περισσότερες από δύο υποκατηγορίες και νοήματα σ’ ένα ερώτημα, πράγμα που σημαίνει ότι ο χρήστης θα πρέπει να διατρέξει έναν μεγάλο αριθμό αποτελεσμάτων για να βρει αυτά που τον ενδιαφέρουν. Στόχος της παρούσας διπλωματικής εργασίας είναι η ανάπτυξη ενός έμπειρου συστήματος, που θα μετά-επεξεργάζεται τις απαντήσεις μας κλασικής μηχανής αναζήτησης και θα ομαδοποιεί τα αποτελέσματα σε μια ιεραρχία από κατηγορίες με βάση το περιεχόμενο τους. Οι σημαντικότερες σημερινές λύσεις πάνω στο πρόβλημα της αντιστοίχησης των αποτελεσμάτων σε συστάδες είναι τα συστήματα Vivisimo, Carrot, CREDO και SnakeT. Η συνεισφορά που προτείνεται στη παρούσα εργασία, είναι η χρήση μίας σειράς τεχνικών που βελτιώνουν την ποιότητα των ομάδων απάντησης. Μία πρωτότυπη τεχνική που χρησιμοποιήθηκε στην παρούσα εργασία είναι η αναδιατύπωση των ερωτημάτων (query reformulation) μέσω διαφόρων στρατηγικών. Ο λόγος που παρουσιάζονται τέτοιες στρατηγικές, είναι επειδή συχνά οι χρήστες τροποποιούν ένα προηγούμενο ερώτημα αναζήτησης ώστε να ανακτήσουν καλύτερα αποτελέσματα ή κι επειδή πολλές φορές δεν μπορούν να διατυπώσουν σωστά ένα ερώτημα λόγω της μη γνώσης επιθυμητών αποτελεσμάτων. Επιπλέον, επωφεληθήκαμε από τη Wikipedia αντλώντας δεδομένα από τους τίτλους των σελίδων αλλά κι από τις κατηγορίες στις οποίες ανήκουν αυτές οι σελίδες. Αυτό γίνεται μέσω της σύνδεσης των συχνών όρων που ανήκουν στα κείμενα των αποτελεσμάτων αναζήτησης με τη σημασιολογική εγκυκλοπαίδεια Wikipedia, με σκοπό την εξαγωγή των διαφορετικών εννοιών και νοημάτων του κάθε όρου. Ειδικότερα, αναζητείται στη Wikipedia η ύπαρξη σελίδας (ή σελίδων για το φαινόμενο της αμφισημίας) που αντιστοιχίζονται στους όρους αυτούς με αποτέλεσμα τη χρησιμοποίηση του τίτλου και της κατηγορίας ως επιπρόσθετη πληροφορία. Τέλος η Wikipedia χρησιμοποιείται και στην ανάθεση ετικετών στις τελικές συστάδες ως επιπρόσθετη πληροφορία κάθε ξεχωριστού κειμένου που βρίσκεται στη συστάδα. / - Σημασιολογικός ιστός Ομαδοποίηση Ανάκτηση πληροφορίας Ανάθεση ετικετών Δεικτοδότηση 025.042 7 Semantic web Clustering Data mining Labeling Annotation Query reformulation
9	Personalized Access to Contextual Information by using an Assistant for Query Reformulation / Personnalisation et Adaptation de L’accès à L’information Contextuelle en utilisant un Assistant Intelligent Asfari, Ounas 19 September 2011 (has links) Les travaux présentés dans cette thèse rentrent dans le cadre de la Recherche d'Information (RI) et s'intéressent à une des questions de recherche actuellement en vogue dans ce domaine: la prise en compte du contexte de l'utilisateur pendant sa quête de l'information pertinente. Nous proposons une approche originale de reformulation automatique de requêtes basée sur le profil utilisateur et sa tâche actuelle. Plus précisément, notre approche tient compte deux éléments du contexte, les centres d'intérêts de l'utilisateur (son profil) et la tâche qu'il réalise, pour suggérer des requêtes appropriées à son contexte. Nous proposons, en particulier, toute une démarche originale permettant de bien interpréter et réécrire la requête initiale en fonction des activités réalisées dans la tâche courante de l'utilisateur.Nous considérons qu'une tâche est jalonnée par des activités, nous proposons alors d'interpréter le besoin de l'utilisateur, représenté initialement par la requête, selon ses activités actuelles dans la tâche (et son profil) et de suggérer des reformulations de requêtes appropriées à ces activités.Une implémentation de cette approche est faite, et elle est suivie d’une étude expérimentale. Nous proposons également une procédure d'évaluation qui tient compte l'évaluation des termes d'expansion, et l'évaluation des résultats retournés en utilisant les requêtes reformulées, appelés SRQ State Reformulated Query. Donc, trois facteurs d’évaluation sont proposés sur lesquels nous nous appuierons pour l'analyse et l'évaluation des résultats. L’objective est de quantifier l'amélioration apportée par notre système dans certains contextes par rapport aux autres systèmes. Nous prouvons que notre approche qui prend en compte la tâche actuelle de l'utilisateur est effectivement plus performante que les approches basées, soit uniquement sur la requête initiale, ou encore celle basée sur la requête reformulée en considérant uniquement le profil de l'utilisateur. / Access to relevant information adapted to the needs and the context of the user is areal challenge in Web Search, owing to the increases of heterogeneous resources andthe varied data on the web. There are always certain needs behind the user query,these queries are often ambiguous and shortened, and thus we need to handle thesequeries intelligently to satisfy the user’s needs. For improving user query processing,we present a context-based hybrid method for query expansion that automaticallygenerates new reformulated queries in order to guide the information retrieval systemto provide context-based personalized results depending on the user profile andhis/her context. Here, we consider the user context as the actual state of the task thatthe user is undertaking when the information retrieval process takes place. Thus StateReformulated Queries (SRQ) are generated according to the task states and the userprofile which is constructed by considering related concepts from existing concepts ina domain ontology. Using a task model, we will show that it is possible to determinethe user’s current task automatically. We present an experimental study in order toquantify the improvement provided by our system compared to the direct querying ofa search engine without reformulation, or compared to the personalized reformulationbased on a user profile only. The Preliminary results have proved the relevance of ourapproach in certain contexts. Recherche d’information, Reformulation de requêtes, Contexte de l’utilisateur Modélisation des tâches, Personnalisation Profil utilisateur. Information Retrieval Query Reformulation User Context Task modeling Personalization User profile
10	Répondre efficacement aux requêtes Big Data en présence de contraintes / Efficient Big Data query answering in the presence of constraints Bursztyn, Damián 15 December 2016 (has links) Les contraintes sont les artéfacts fondamentaux permettant de donner un sens aux données. Elles garantissent que les données sont conformes aux besoins des applications. L'objet de cette thèse est d'étudier deux problématiques liées à la gestion efficace des données en présence de contraintes. Nous abordons le problème de répondre efficacement à des requêtes portant sur des données, en présence de contraintes déductives. Cela mène à des données implicites dérivant de données explicites et de contraintes. Les données implicites requièrent une étape de raisonnement afin de calculer les réponses aux requêtes. Le raisonnement par reformulation des requêtes compile les contraintes dans une requête modifiée qui, évaluée à partir des données explicites uniquement, génère toutes les réponses fondées sur les données explicites et implicites. Comme les requêtes reformulées peuvent être complexes, leur évaluation est souvent difficile et coûteuse. Nous étudions l'optimisation de la technique de réponse aux requêtes par reformulation dans le cadre de l'accès aux données à travers une ontologie, où des requêtes conjonctives SPARQL sont posées sur un ensemble de faits RDF sur lesquels des contraintes RDF Schema (RDFS) sont exprimées. La thèse apporte les contributions suivantes. (i) Nous généralisons les langages de reformulation de requêtes précédemment étudiées, afin d'obtenir un espace de reformulations d'une requête posée plutôt qu'une unique reformulation. (ii) Nous présentons des algorithmes effectifs et efficaces, fondés sur un modèle de coût, permettant de sélectionner une requête reformulée ayant le plus faible coût d'évaluation. (iii) Nous montrons expérimentalement que notre technique améliore significativement la performance de la technique de réponse aux requêtes par reformulation. Au-delà de RDFS, nous nous intéressons aux langages d'ontologie pour lesquels répondre à une requête peut se réduire à l'évaluation d'une certaine formule de la Logique du Premier Ordre (obtenue à partir de la requête et de l'ontologie), sur les faits explicites uniquement. (iv) Nous généralisons la technique de reformulation optimisée pour RDF, mentionnée ci-dessus, aux formalismes pour répondre à une requête LPO-réductible. (v) Nous appliquons cette technique à la Logique de Description DL-LiteR sous-jacente au langage OWL2 QL du W3C, et montrons expérimentalement ses avantages dans ce contexte. Nous présentons également, brièvement, un travail en cours sur le problème consistant à fournir des chemins d'accès efficaces aux données dans les systèmes Big Data. Nous proposons d'utiliser un ensemble de systèmes de stockages hétérogènes afin de fournir une meilleure performance que n'importe lequel d'entre eux, utilisé individuellement. Les données stockées dans chaque système peuvent être décrites comme des vues matérialisées sur les données applicatives. Répondre à une requête revient alors à réécrire la requête à l'aide des vues disponibles, puis à décoder la réécriture produite comme un ensemble de requêtes à exécuter sur les systèmes stockant les vues, ainsi qu'une requête les combinant de façon appropriée. / Constraints are the essential artefact for giving meaning to data, ensuring that it fits real-life application needs, and that its meaning is correctly conveyed to the users. This thesis investigates two fundamental problems related to the efficient management of data in the presence of constraints. We address the problem of efficiently answering queries over data in the presence of deductive constraints, which lead to implicit data that is entailed (derived) from the explicit data and the constraints. Implicit data requires a reasoning step in order to compute complete query answers, and two main query answering techniques exist. Data saturation compiles the constraints into the database by making all implicit data explicit, while query reformulation compiles the constraints into a modified query, which, evaluated over the explicit data only, computes all the answer due to explicit and/or implicit data. So far, reformulation-based query answering has received significantly less attention than saturation. In particular, reformulated queries may be complex, thus their evaluation may be very challenging. We study optimizing reformulation-based query answering in the setting of ontology-based data access, where SPARQL conjunctive queries are answered against a set of RDF facts on which constraints hold. When RDF Schema is used to express the constraints, the thesis makes the following contributions. (i) We generalize prior query reformulation languages, leading to a space of reformulated queries we call JUCQs (joins of unions of conjunctive queries), instead of a single fixed reformulation. (ii) We present effective and efficient cost-based algorithms for selecting from this space, a reformulated query with the lowest estimated cost. (iii) We demonstrate through experiments that our technique drastically improves the performance of reformulation-based query answering while always avoiding “worst-case” performance. Moving beyond RDFS, we consider the large and useful set of ontology languages enjoying FOL reducibility of query answering: answering a query can be reduced to evaluating a certain first-order logic (FOL) formula (obtained from the query and ontology) against only the explicit facts. (iv) We generalize the above-mentioned JUCQ-based optimized reformulation technique to improve performance in any FOL-reducible setting, and (v) we instantiate this framework to the DL-LiteR Description Logic underpinning the W3C’s OWL2 QL ontology language, demonstrating significant performance advantages in this setting also. We also report on current work regarding the problem of providing efficient data access paths in Big Data stores. We consider a setting where a set of different, heterogeneous storage systems can be used side by side to provide better performance than any of them used individually. In such a setting, the data stored in each system can be described as views over the application data. Answering a query thus amounts to rewrite the query using the available views, and then to decode the rewriting into a set of queries to be executed on the systems holding the views, and a query combining them appropriately. Web sémantique Optimisation des requêtes Reformulation des requêtes Polystores Semantic Web Query optimization Query reformulation Query answering under constraints Hybrid stores

Search results