Global ETD Search

31	Augmenting Dynamic Query Expansion in Microblog Texts Khandpur, Rupinder P. 17 August 2018 (has links) Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems. In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts. / Ph. D. / Analysis of real-time, social media can provide critical insights into ongoing societal events. Where consequences and implications of specific events include monetary losses, threats to critical infrastructure and national security, disruptions to daily life, and a potential to cause loss of life and physical property. It is imperative for developing good ‘ground truth’ to develop adequate data-driven information systems, i.e., an authoritative record of events reported in the media cataloged alongside important dimensions. Availability of high-quality ground truth events can support various analytic efforts, e.g., identifying precursors of attacks, developing predictive indicators using surrogate data sources, and tracking the progression of events over space and time. A dynamic search result refinement is useful for expanding a general set of user queries into a more relevant collection. The challenges of information overload and misalignment of context between the user query and retrieved results can overwhelm both human and machine. In this dissertation, we focus our efforts on these specific challenges. With the ever-increasing volume of user-generated data large-scale analysis is a tedious task. Our first focus is to develop a scalable model that dynamically tracks and ranks evolving topics as they appear in social media. Then to simplify the cognitive tasks involving sense-making of evolving themes, we take a visual approach to retrieve situationally critical and emergent information effectively. This visual analytics approach learns from user’s interactions during the exploratory process and then generates a better representation of the data. Thus, improving the situational understanding and usability of underlying data models. Such features are crucial for big-data based decision & support systems. To make the event-focused retrieval process more robust, we developed a context-rich procedure that adds new relevant key terms to the user’s original query by utilizing the linguistic structures in text. This context-awareness allows the algorithm to retrieve those relevant characteristics that can help users to gain adequate information from social media about real-world events. Online social commentary about events is very informal and can be incomplete. However, to get the complete picture and adequately describe these events we develop an approach that models the underlying relatedness of information and iteratively extract meaning and denotations from event-related texts. We learn how to express the high-order relationships between events and entities and group them to identify those attributes that best explain the events the user is trying to uncover. In all the augmentations we develop, our strategy is to allow only very minimal human supervision using just a small set of seed event triggers and requires no training or labeled samples. We show a comprehensive evaluation of these augmentations on real-world domains - threats on airports, cyber attacks, and protests. We also demonstrate their applicability as for real-time analysis that provides vital event characteristics, and contextually consistent information can be a beneficial aid for emergency responders. Read more Dynamic Query Expansion Microblog Event Retrieval Social Media Analytics Visual Knowledge Discovery
32	Contribution à l'analyse et l'évaluation des requêtes expertes : cas du domaine médical / Contribution to the analyze and evaluation of clinical queries : medical domain Znaidi, Eya 30 June 2016 (has links) La recherche d'information nécessite la mise en place de stratégies qui consistent à (1) cerner le besoin d'information ; (2) formuler le besoin d'information ; (3) repérer les sources pertinentes ; (4) identifier les outils à exploiter en fonction de ces sources ; (5) interroger les outils ; et (6) évaluer la qualité des résultats. Ce domaine n'a cessé d'évoluer pour présenter des techniques et des approches permettant de sélectionner à partir d'un corpus de documents l'information pertinente capable de satisfaire le besoin exprimé par l'utilisateur. De plus, dans le contexte applicatif du domaine de la RI biomédicale, les sources d'information hétérogènes sont en constante évolution, aussi bien du point de vue de la structure que du contenu. De même, les besoins en information peuvent être exprimés par des utilisateurs qui se caractérisent par différents profils, à savoir : les experts médicaux comme les praticiens, les cliniciens et les professionnels de santé, les utilisateurs néophytes (sans aucune expertise ou connaissance du domaine) comme les patients et leurs familles, etc. Plusieurs défis sont liés à la tâche de la RI biomédicale, à savoir : (1) la variation et la diversité du besoin en information, (2) différents types de connaissances médicales, (3) différences de compé- tences linguistiques entre experts et néophytes, (4) la quantité importante de la littérature médicale ; et (5) la nature de la tâche de RI médicale. Cela implique une difficulté d'accéder à l'information pertinente spécifique au contexte de la recherche, spécialement pour les experts du domaine qui les aideraient dans leur prise de décision médicale. Nos travaux de thèse s'inscrivent dans le domaine de la RI biomédicale et traitent les défis de la formulation du besoin en information experte et l'identification des sources pertinentes pour mieux répondre aux besoins cliniques. Concernant le volet de la formulation et l'analyse de requêtes expertes, nous proposons des analyses exploratoires sur des attributs de requêtes, que nous avons définis, formalisés et calculés, à savoir : (1) deux attributs de longueur en nombre de termes et en nombre de concepts, (2) deux facettes de spécificité terme-document et hiérarchique, (3) clarté de la requête basée sur la pertinence et celle basée sur le sujet de la requête. Nous avons proposé des études et analyses statistiques sur des collections issues de différentes campagnes d'évaluation médicales CLEF et TREC, afin de prendre en compte les différentes tâches de RI. Après les analyses descriptives, nous avons étudié d'une part, les corrélations par paires d'attributs de requêtes et les analyses de corrélation multidimensionnelle. Nous avons étudié l'impact de ces corrélations sur les performances de recherche d'autre part. Nous avons pu ainsi comparer et caractériser les différentes requêtes selon la tâche médicale d'une manière plus généralisable. Concernant le volet lié à l'accès à l'information, nous proposons des techniques d'appariement et d'expansion sémantiques de requêtes dans le cadre de la RI basée sur les preuves cliniques. / The research topic of this document deals with a particular setting of medical information retrieval (IR), referred to as expert based information retrieval. We were interested in information needs expressed by medical domain experts like praticians, physicians, etc. It is well known in information retrieval (IR) area that expressing queries that accurately reflect the information needs is a difficult task either in general domains or specialized ones and even for expert users. Thus, the identification of the users' intention hidden behind queries that they submit to a search engine is a challenging issue. Moreover, the increasing amount of health information available from various sources such as government agencies, non-profit and for-profit organizations, internet portals etc. presents oppor- tunities and issues to improve health care information delivery for medical professionals, patients and general public. One critical issue is the understanding of users search strategies and tactics for bridging the gap between their intention and the delivered information. In this thesis, we focus, more particularly, on two main aspects of medical information needs dealing with the expertise which consist of two parts, namely : - Understanding the users intents behind the queries is critically important to gain a better insight of how to select relevant results. While many studies investigated how users in general carry out exploratory health searches in digital environments, a few focused on how are the queries formulated, specifically by domain expert users. We address more specifically domain expert health search through the analysis of query attributes namely length, specificity and clarity using appropriate proposed measures built according to different sources of evidence. In this respect, we undertake an in-depth statistical analysis of queries issued from IR evalua- tion compaigns namely Text REtrieval Conference (TREC) and Conference and Labs of the Evaluation Forum (CLEF) devoted for different medical tasks within controlled evaluation settings. - We address the issue of answering PICO (Population, Intervention, Comparison and Outcome) clinical queries formulated within the Evidence Based Medicine framework. The contributions of this part include (1) a new algorithm for query elicitation based on the semantic mapping of each facet of the query to a reference terminology, and (2) a new document ranking model based on a prioritized aggregation operator. we tackle the issue related to the retrieval of the best evidence that fits with a PICO question, which is an underexplored research area. We propose a new document ranking algorithm that relies on semantic based query expansion leveraged by each question facet. The expansion is moreover bounded by the local search context to better discard irrelevant documents. The experimental evaluation carried out on the CLIREC dataset shows the benefit of our approaches. Read more Medical IR Clinical queries Statistical analysis Priority agregation operator PICO queries Clinical evidence Semantic representation Query expansion Medical IR Clinical queries Statistical analysis Priority agregation operator PICO queries Clinical evidence Semantic representation Query expansion
33	Systematisierung und Evaluierung von Clustering-Verfahren im Information Retrieval Kürsten, Jens 04 December 2006 (has links) (PDF) Im Rahmen der vorliegenden Diplomarbeit werden Verfahren zur Clusteranalyse sowie deren Anwendungsmöglichkeiten zur Optimierung der Rechercheergebnisse von Information Retrievalsystemen untersucht. Die Grundlage der vergleichenden Evaluation erfolgversprechender Ansätze zur Clusteranalyse anhand der Domain Specific Monolingual Tasks des Cross-Language Evaluation Forums 2006 bildet die systematische Analyse der in der Forschung etablierten Verfahren zur Clusteranalyse. Die Implementierung ausgewählter Clusterverfahren wird innerhalb eines bestehenden, Lucene-basierten Retrievalsystems durchgeführt. Zusätzlich wird dieses System im Rahmen dieser Arbeit mit Komponenten zur Query Expansion und zur Datenfusion ausgestattet. Diese beiden Ansätze haben sich in der Forschung zur automatischen Optimierung von Retrievalergebnissen durchgesetzt und bilden daher die Bewertungsgrundlage für die implementierten Konzepte zur Optimierung von Rechercheergebnissen auf Basis der Clusteranalyse. Im Ergebnis erweist sich das lokale Dokument Clustering auf Basis des k-means Clustering-Algorithmus in Kombination mit dem Pseudo-Relevanz-Feedback Ansatz zur Selektion der Dokumente für die Query Expansion als besonders erfolgversprechend. Darüber hinaus wird gezeigt, dass mit Hilfe der Datenfusion auf Basis des Z-Score Operators die Ergebnisse verschiedener Indizierungsverfahren so kombiniert werden können, dass sehr gute und insbesondere sehr robuste Rechercheergebnisse erreicht werden. / Within the present diploma thesis, widely used Cluster Analysis approaches are studied in respect to their application to optimize the results of Information Retrieval systems. A systematic analysis of approved methods of the Cluster Analysis is the basis of the comparative evaluation of promising approaches to use the Cluster Analysis to optimize retrieval results. The evaluation is accomplished by the participation at the Domain Specific Monolingual Tasks of the Cross-Language Evaluation Forum 2006. The implementation of selected approaches for Clustering is realized within the framework of an existing Lucene-based retrieval system. Within the scope of work, this system will be supplemented with components for Query Expansion and Data Fusion. Both approaches have prevailed in the research of automatic optimization of retrieval results. Therefore, they are the basis of assessment of the implemented methods, which aim at improving the results of retrieval and are based on Cluster Analysis. The results show that selecting documents for Query Expansion with the help of local Document Clustering based on the k-means Clustering algorithm combined with the Blind Feedback approach is very promising. Furthermore, the Data Fusion approach based on the Z-Score operator proves to be very useful to combine retrieval results of different indexing methods. In fact, this approach achieves very good and in particular very robust results of retrieval. Read more Datenfusion/Merging Optimierung von Rechercheergebnissen Query Expansion ddc:000 ddc:004 ddc:020 Cluster-Analyse Evaluation Information Retrieval
34	Short text contextualization in information retrieval : application to tweet contextualization and automatic query expansion / Contextualisation de textes courts pour la recherche d'information : application à la contextualisation de tweets et à l'expansion automatique de requêtes. Ermakova, Liana 31 March 2016 (has links) La communication efficace a tendance à suivre la loi du moindre effort. Selon ce principe, en utilisant une langue donnée les interlocuteurs ne veulent pas travailler plus que nécessaire pour être compris. Ce fait mène à la compression extrême de textes surtout dans la communication électronique, comme dans les microblogues, SMS, ou les requêtes dans les moteurs de recherche. Cependant souvent ces textes ne sont pas auto-suffisants car pour les comprendre, il est nécessaire d’avoir des connaissances sur la terminologie, les entités nommées ou les faits liés. Ainsi, la tâche principale de la recherche présentée dans ce mémoire de thèse de doctorat est de fournir le contexte d’un texte court à l’utilisateur ou au système comme à un moteur de recherche par exemple.Le premier objectif de notre travail est d'aider l’utilisateur à mieux comprendre un message court par l’extraction du contexte d’une source externe comme le Web ou la Wikipédia au moyen de résumés construits automatiquement. Pour cela nous proposons une approche pour le résumé automatique de documents multiples et nous l’appliquons à la contextualisation de messages, notamment à la contextualisation de tweets. La méthode que nous proposons est basée sur la reconnaissance des entités nommées, la pondération des parties du discours et la mesure de la qualité des phrases. Contrairement aux travaux précédents, nous introduisons un algorithme de lissage en fonction du contexte local. Notre approche s’appuie sur la structure thème-rhème des textes. De plus, nous avons développé un algorithme basé sur les graphes pour le ré-ordonnancement des phrases. La méthode a été évaluée à la tâche INEX/CLEF Tweet Contextualization sur une période de 4 ans. La méthode a été également adaptée pour la génération de snippets. Les résultats des évaluations attestent une bonne performance de notre approche. / The efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. However, sometimes these texts are not self-contained and need to be explained since understanding them requires knowledge of terminology, named entities or related facts. The main goal of this research is to provide a context to a user or a system from a textual resource.The first aim of this work is to help a user to better understand a short message by extracting a context from an external source like a text collection, the Web or the Wikipedia by means of text summarization. To this end we developed an approach for automatic multi-document summarization and we applied it to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm for smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval. The evaluation results indicate good performance of the approach. Read more Recherche d'information Contextualisation Expansion de requête Résumé automatique Thème-rhème Information retrieval Contextualization Query expansion Automatic summarization Topic-comment
35	Relevance Analysis for Document Retrieval Labouve, Eric 01 March 2019 (has links) (PDF) Document retrieval systems recover documents from a dataset and order them according to their perceived relevance to a user’s search query. This is a diﬃcult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user’s literal query and a user’s true intentions. Even with this ambiguity that arises with a lack of context, users still expect that the set of documents returned by a search engine is both highly relevant to their query and properly ordered. The focus of this thesis is on document retrieval systems that explore methods of ordering documents from unstructured, textual corpora using text queries. The main goal of this study is to enhance the Okapi BM25 document retrieval model. In doing so, this research hypothesizes that the structure of text inside documents and queries hold valuable semantic information that can be incorporated into the Okapi BM25 model to increase its performance. Modiﬁcations that account for a term’s part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25 to increase the model’s performance. The study resulted in 87 modiﬁcations which were all validated using open source corpora. The top scoring modiﬁcation from the validation phase was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision. When compared against two industry standard search engines, Lucene and Solr, the top scoring modiﬁcation largely outperforms these systems by upwards to 21.78% and 23.01%, respectively. Read more Semantic Analysis Document Retrieval Query Expansion Term Proximity Search Okapi BM25 Computer and Systems Architecture Data Storage Systems
36	Extracting and exploiting word relationships for information retrieval Cao, Guihong January 2008 (has links) Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal. Recherche d'information Information retrieval Modèle de langue Language Modeling Relation entre termes Word relationship Expansion de document Document expansion Expansion de requête Query expansion
37	APPLYING ENTERPRISE MODELS AS INTERFACE FOR INFORMATION SEARCHING MATONGO, Tanguy, DEGBELO, Auriol January 2009 (has links) <p>Nowadays, more and more companies use Enterprise Models to integrate and coordinate their business processes with the aim of remaining competitive on the market. Consequently, Enterprise Models play a critical role in this integration enabling to improve the objectives of the enterprise, and ways to reach them in a given period of time. Through Enterprise Models, companies are able to improve the management of their operations, actors, processes and also to improve communication within the organisation.</p><p>This thesis describes another use of Enterprise Models. In this work, we intend to apply Enterprise Models as interface for information searching. The underlying needsfor this project lay in the fact that we would like to show that Enterprise Models canbe more than just models but it can be used in a more dynamic way which is through a software program for information searching. The software program aimed at, first,extracting the information contained in the Enterprise Models (which are stored into aXML file on the system). Once the information is extracted, it is used to express a query which will be sent into a search engine to retrieve some relevant document to the query and return them to the user.</p><p>The thesis was carried out over an entire academic semester. The results of this workare a report which summarizes all the knowledge gained into the field of the study. A software has been built to serve as a proof of testing the theories.</p> Read more Information Retrieval Information Extraction semantic integration query expansion ontology mapping knowledge engineering Informatik, data- och systemvetenskap INTERDISCIPLINARY RESEARCH AREAS TVÄRVETENSKAPLIGA FORSKNINGSOMRÅDEN
38	Αυτόματη επιλογή σημασιολογικά συγγενών όρων για την επαναδιατύπωση των ερωτημάτων σε μηχανές αναζήτησης πληροφορίας / Automatic selection of semantic related terms for reformulating a query into a search engine Κοζανίδης, Ελευθέριος 14 September 2007 (has links) Η βελτίωση ερωτημάτων (Query refinement) είναι η διαδικασία πρότασης εναλλακτικών όρων στους χρήστες των μηχανών αναζήτησης του Διαδικτύου για την διατύπωση της πληροφοριακής τους ανάγκης. Παρόλο που εναλλακτικοί σχηματισμοί ερωτημάτων μπορούν να συνεισφέρουν στην βελτίωση των ανακτηθέντων αποτελεσμάτων, η χρησιμοποίησή τους από χρήστες του Διαδικτύου είναι ιδιαίτερα περιορισμένη καθώς οι όροι των βελτιωμένων ερωτημάτων δεν περιέχουν σχεδόν καθόλου πληροφορία αναφορικά με τον βαθμό ομοιότητάς τους με τους όρους του αρχικού ερωτήματος, ενώ συγχρόνως δεν καταδεικνύουν το βαθμό συσχέτισής τους με τα πληροφοριακά ενδιαφέροντα των χρηστών. Παραδοσιακά, οι εναλλακτικοί σχηματισμοί ερωτημάτων καθορίζονται κατ’ αποκλειστικότητα από τη σημασιολογική σχέση που επιδεικνύουν οι συμπληρωματικοί όροι με τους αρχικούς όρους του ερωτήματος, χωρίς να λαμβάνουν υπόψη τον επιδιωκόμενο στόχο της αναζήτησης που υπολανθάνει πίσω από ένα ερώτημα του χρήστη. Στην παρούσα εργασία θα παρουσιάσουμε μια πρότυπη τεχνική βελτίωσης ερωτημάτων η οποία χρησιμοποιεί μια λεξική οντολογία προκειμένου να εντοπίσει εναλλακτικούς σχηματισμούς ερωτημάτων οι οποίοι αφενός, θα περιγράφουν το αντικείμενο της αναζήτησης του χρήστη και αφετέρου θα σχετίζονται με τα ερωτήματα που υπέβαλε ο χρήστης. Το πιο πρωτοποριακό χαρακτηριστικό της τεχνικής μας είναι η οπτική αναπαράσταση του εναλλακτικού ερωτήματος με την μορφή ενός ιεραρχικά δομημένου γράφου. Η αναπαράσταση αυτή παρέχει σαφείς πληροφορίες για την σημασιολογική σχέση μεταξύ των όρων του βελτιωμένου ερωτήματος και των όρων που χρησιμοποίησε ο χρήστης για να εκφράσει την πληροφοριακή του ανάγκη ενώ παράλληλα παρέχει την δυνατότητα στον χρήστη να επιλέξει ποιοι από τους υποψήφιους όρους θα συμμετέχουν τελικά στην διαδικασία βελτιστοποίησης δημιουργώντας διαδραστικά το νέο ερώτημα. Τα αποτελέσματα των πειραμάτων που διενεργήσαμε για να αξιολογήσουμε την απόδοση της τεχνικής μας, είναι ιδιαίτερα ικανοποιητικά και μας οδηγούν στο συμπέρασμα ότι η μέθοδός μας μπορεί να βοηθήσει σημαντικά στη διευκόλυνση του χρήστη κατά τη διαδικασία επιλογής ερωτημάτων για την ανάκτηση πληροφορίας από τα δεδομένα του Παγκόσμιου Ιστού. / Query refinement is the process of providing Web information seekers with alternative wordings for expressing their information needs. Although alternative query formulations may contribute to the improvement of retrieval results, nevertheless their realization by Web users is intrinsically limited in that alternative query wordings do not convey explicit information about neither their degree nor their type of correlation to the user-issued queries. Moreover, alternative query formulations are determined based on the semantics of the issued query alone and they do not consider anything about the search intentions of the user issuing that query. In this paper, we introduce a novel query refinement technique which uses a lexical ontology for identifying alternative query formulations that are both informative of the user’s interests and related to the user selected queries. The most innovative feature of our technique is the visualization of the alternative query wordings in a graphical representation form, which conveys explicit information about the refined queries correlation to the user issued requests and which allows the user select which terms to participate in the refinement process. Experimental results demonstrate that our method has a significant potential in improving the user search experience. Read more Διεύρυνση ερωτημάτων Βελτίωση ερωτημάτων Σημασιολογικά δίκτυα Ανάκτηση πληροφορίας 025.524 Query expansion Query refinement Semantic networks Information retrieval Natural language processing Sense disambiguation
39	Automatic Concept-Based Query Expansion Using Term Relational Pathways Built from a Collection-Specific Association Thesaurus Lyall-Wilson, Jennifer Rae January 2013 (has links) The dissertation research explores an approach to automatic concept-based query expansion to improve search engine performance. It uses a network-based approach for identifying the concept represented by the user's query and is founded on the idea that a collection-specific association thesaurus can be used to create a reasonable representation of all the concepts within the document collection as well as the relationships these concepts have to one another. Because the representation is generated using data from the association thesaurus, a mapping will exist between the representation of the concepts and the terms used to describe these concepts. The research applies to search engines designed for use in an individual website with content focused on a specific conceptual domain. Therefore, both the document collection and the subject content must be well-bounded, which affords the ability to make use of techniques not currently feasible for general purpose search engine used on the entire web. automatic query expansion conceptual network information retrieval Lucene search engine Natural Language Processing (NLP) association thesaurus
40	Extracting and exploiting word relationships for information retrieval Cao, Guihong January 2008 (has links) Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal Recherche d'information Information retrieval Modèle de langue Language Modeling Relation entre termes Word relationship Expansion de document Document expansion Expansion de requête Query expansion

Search results