Global ETD Search

41	Sistema FOQuE para expansão semântica de consultas baseada em ontologias difusas Yaguinuma, Cristiane Akemi 22 June 2007 (has links) Made available in DSpace on 2016-06-02T19:05:26Z (GMT). No. of bitstreams: 1 1634.pdf: 2033754 bytes, checksum: ef58063d765aca814c3608c0828d4965 (MD5) Previous issue date: 2007-06-22 / Financiadora de Estudos e Projetos / As availability of data from several areas of knowledge grows, it is even more necessary to develop effective techniques to retrieve the desired information, aiming to reduce irrelevant answers and ensure that relevant results are not ignored. Considering this context, we present the FOQuE system, developed to perform query expansion in order to retrieve semantically relevant and broad results. Based on fuzzy ontologies, this system is able to obtain approximate results that satisfy user requirements according to expansion parameters defined by the user. The additional answers retrieved by the FOQuE system are classified according to the semantic expansion performed and the relevance to the query, therefore it is possible to improve results that are presented to the user. / Diante da crescente facilidade de acesso a dados de diversas áreas do conhecimento, cada vez mais são necessárias técnicas eficazes para recuperar a informação desejada, visando reduzir respostas irrelevantes e assegurar que resultados relevantes não sejam desprezados. Dentro deste contexto, este trabalho apresenta o sistema FOQuE, desenvolvido para realizar diversos tipos de expansão de consultas com o intuito de recuperar resultados semanticamente relevantes e abrangentes. Baseado em ontologias difusas, este sistema é capaz de obter resultados aproximados que satisfaçam aos requisitos do usuário, de acordo com parâmetros de expansão especificados por ele. As respostas adicionais recuperadas pelo sistema FOQuE são classificadas segundo o tipo de expansão realizada e a relevância para a consulta, melhorando, assim, a forma como os resultados são apresentados ao usuário. Banco de dados Expansão de consultas Informação semântica nebulosa Ontologia Fuzzy logic Query expansion Fuzzy ontologies Data semantics Fuzzy logics
42	APPLYING ENTERPRISE MODELS AS INTERFACE FOR INFORMATION SEARCHING MATONGO, Tanguy, DEGBELO, Auriol January 2009 (has links) Nowadays, more and more companies use Enterprise Models to integrate and coordinate their business processes with the aim of remaining competitive on the market. Consequently, Enterprise Models play a critical role in this integration enabling to improve the objectives of the enterprise, and ways to reach them in a given period of time. Through Enterprise Models, companies are able to improve the management of their operations, actors, processes and also to improve communication within the organisation. This thesis describes another use of Enterprise Models. In this work, we intend to apply Enterprise Models as interface for information searching. The underlying needsfor this project lay in the fact that we would like to show that Enterprise Models canbe more than just models but it can be used in a more dynamic way which is through a software program for information searching. The software program aimed at, first,extracting the information contained in the Enterprise Models (which are stored into aXML file on the system). Once the information is extracted, it is used to express a query which will be sent into a search engine to retrieve some relevant document to the query and return them to the user. The thesis was carried out over an entire academic semester. The results of this workare a report which summarizes all the knowledge gained into the field of the study. A software has been built to serve as a proof of testing the theories. Information Retrieval Information Extraction semantic integration query expansion ontology mapping knowledge engineering Computer and Information Sciences Data- och informationsvetenskap Social Sciences Interdisciplinary
43	Text-Based Information Retrieval Using Relevance Feedback Krishnan, Sharenya January 2011 (has links) Europeana, a freely accessible digital library with an idea to make Europe's cultural and scientific heritage available to the public was founded by the European Commission in 2008. The goal was to deliver a semantically enriched digital content with multilingual access to it. Even though they managed to increase the content of data they slowly faced the problem of retrieving information in an unstructured form. So to complement the Europeana portal services, ASSETS (Advanced Search Service and Enhanced Technological Solutions) was introduced with services that sought to improve the usability and accessibility of Europeana. My contribution is to study different text-based information retrieval models, their relevance feedback techniques and to implement one simple model. The thesis explains a detailed overview of the information retrieval process along with the implementation of the chosen strategy for relevance feedback that generates automatic query expansion. Finally, the thesis concludes with the analysis made using relevance feedback, discussion on the model implemented and then an assessment on future use of this model both as a continuation of my work and using this model in ASSETS. Information Retrieval Relevance Feedback Query Expansion Rocchio classification Probabilistic model Lucene Similarity scoring function Kullback-Leibler Divergence (KLD) Engineering and Technology Teknik och teknologier
44	Extracting and Aggregating Temporal Events from Texts Döhling, Lars 11 October 2017 (has links) Das Finden von zuverlässigen Informationen über gegebene Ereignisse aus großen und dynamischen Textsammlungen, wie dem Web, ist ein wichtiges Thema. Zum Beispiel sind Rettungsteams und Versicherungsunternehmen an prägnanten Fakten über Schäden nach Katastrophen interessiert, die heutzutage online in Web-Blogs, Zeitungsartikeln, Social Media etc. zu finden sind. Solche Fakten helfen, die erforderlichen Hilfsmaßnahmen zu bestimmen und unterstützen deren Koordination. Allerdings ist das Finden, Extrahieren und Aggregieren nützlicher Informationen ein hochkomplexes Unterfangen: Es erfordert die Ermittlung geeigneter Textquellen und deren zeitliche Einordung, die Extraktion relevanter Fakten in diesen Texten und deren Aggregation zu einer verdichteten Sicht auf die Ereignisse, trotz Inkonsistenzen, vagen Angaben und Veränderungen über die Zeit. In dieser Arbeit präsentieren und evaluieren wir Techniken und Lösungen für jedes dieser Probleme, eingebettet in ein vierstufiges Framework. Die angewandten Methoden beruhen auf Verfahren des Musterabgleichs, der Verarbeitung natürlicher Sprache und des maschinellen Lernens. Zusätzlich berichten wir über die Ergebnisse zweier Fallstudien, basierend auf dem Einsatz des gesamten Frameworks: Die Ermittlung von Daten über Erdbeben und Überschwemmungen aus Webdokumenten. Unsere Ergebnisse zeigen, dass es unter bestimmten Umständen möglich ist, automatisch zuverlässige und zeitgerechte Daten aus dem Internet zu erhalten. / Finding reliable information about given events from large and dynamic text collections, such as the web, is a topic of great interest. For instance, rescue teams and insurance companies are interested in concise facts about damages after disasters, which can be found today in web blogs, online newspaper articles, social media, etc. Knowing these facts helps to determine the required scale of relief operations and supports their coordination. However, finding, extracting, and condensing specific facts is a highly complex undertaking: It requires identifying appropriate textual sources and their temporal alignment, recognizing relevant facts within these texts, and aggregating extracted facts into a condensed answer despite inconsistencies, uncertainty, and changes over time. In this thesis, we present and evaluate techniques and solutions for each of these problems, embedded in a four-step framework. Applied methods are pattern matching, natural language processing, and machine learning. We also report the results for two case studies applying our entire framework: gathering data on earthquakes and floods from web documents. Our results show that it is, under certain circumstances, possible to automatically obtain reliable and timely data from the web. Dokumentenretrieval Query Expansion Temporal Alignment Informationsextraktion Named Entity Recognition Relationsextraktion CRF SVM Dependenzgraph Informationsfusion Funktionsanpassung Erdbeben Flut Document Retrieval Query Expansion Temporal Alignment Information Extraction Named Entity Recognition Relationship Extraction CRF SVM Dependency Graph Information Fusion Curve Fitting Earthquake Flood 004 Datenverarbeitung; Informatik ST 530 ddc:004
45	Diversified query expansion Bouchoucha, Arbi 06 1900 (has links) La diversification des résultats de recherche (DRR) vise à sélectionner divers documents à partir des résultats de recherche afin de couvrir autant d’intentions que possible. Dans les approches existantes, on suppose que les résultats initiaux sont suffisamment diversifiés et couvrent bien les aspects de la requête. Or, on observe souvent que les résultats initiaux n’arrivent pas à couvrir certains aspects. Dans cette thèse, nous proposons une nouvelle approche de DRR qui consiste à diversifier l’expansion de requête (DER) afin d’avoir une meilleure couverture des aspects. Les termes d’expansion sont sélectionnés à partir d’une ou de plusieurs ressource(s) suivant le principe de pertinence marginale maximale. Dans notre première contribution, nous proposons une méthode pour DER au niveau des termes où la similarité entre les termes est mesurée superficiellement à l’aide des ressources. Quand plusieurs ressources sont utilisées pour DER, elles ont été uniformément combinées dans la littérature, ce qui permet d’ignorer la contribution individuelle de chaque ressource par rapport à la requête. Dans la seconde contribution de cette thèse, nous proposons une nouvelle méthode de pondération de ressources selon la requête. Notre méthode utilise un ensemble de caractéristiques qui sont intégrées à un modèle de régression linéaire, et génère à partir de chaque ressource un nombre de termes d’expansion proportionnellement au poids de cette ressource. Les méthodes proposées pour DER se concentrent sur l’élimination de la redondance entre les termes d’expansion sans se soucier si les termes sélectionnés couvrent effectivement les différents aspects de la requête. Pour pallier à cet inconvénient, nous introduisons dans la troisième contribution de cette thèse une nouvelle méthode pour DER au niveau des aspects. Notre méthode est entraînée de façon supervisée selon le principe que les termes reliés doivent correspondre au même aspect. Cette méthode permet de sélectionner des termes d’expansion à un niveau sémantique latent afin de couvrir autant que possible différents aspects de la requête. De plus, cette méthode autorise l’intégration de plusieurs ressources afin de suggérer des termes d’expansion, et supporte l’intégration de plusieurs contraintes telles que la contrainte de dispersion. Nous évaluons nos méthodes à l’aide des données de ClueWeb09B et de trois collections de requêtes de TRECWeb track et montrons l’utilité de nos approches par rapport aux méthodes existantes. / Search Result Diversification (SRD) aims to select diverse documents from the search results in order to cover as many search intents as possible. For the existing approaches, a prerequisite is that the initial retrieval results contain diverse documents and ensure a good coverage of the query aspects. In this thesis, we investigate a new approach to SRD by diversifying the query, namely diversified query expansion (DQE). Expansion terms are selected either from a single resource or from multiple resources following the Maximal Marginal Relevance principle. In the first contribution, we propose a new term-level DQE method in which word similarity is determined at the surface (term) level based on the resources. When different resources are used for the purpose of DQE, they are combined in a uniform way, thus totally ignoring the contribution differences among resources. In practice the usefulness of a resource greatly changes depending on the query. In the second contribution, we propose a new method of query level resource weighting for DQE. Our method is based on a set of features which are integrated into a linear regression model and generates for a resource a number of expansion candidates that is proportional to the weight of that resource. Existing DQE methods focus on removing the redundancy among selected expansion terms and no attention has been paid on how well the selected expansion terms can indeed cover the query aspects. Consequently, it is not clear how we can cope with the semantic relations between terms. To overcome this drawback, our third contribution in this thesis aims to introduce a novel method for aspect-level DQE which relies on an explicit modeling of query aspects based on embedding. Our method (called latent semantic aspect embedding) is trained in a supervised manner according to the principle that related terms should correspond to the same aspects. This method allows us to select expansion terms at a latent semantic level in order to cover as much as possible the aspects of a given query. In addition, this method also incorporates several different external resources to suggest potential expansion terms, and supports several constraints, such as the sparsity constraint. We evaluate our methods using ClueWeb09B dataset and three query sets from TRECWeb tracks, and show the usefulness of our proposed approaches compared to the state-of-the-art approaches. Expansion de requête Intégration de ressources Pondération de ressources Incorporation latente d’aspects Search Result Diversification Query Expansion Multiple Resource Integration Resource Weighting Latent Aspect Embedding
46	Concept oriented biomedical information retrieval Shen, Wei 08 1900 (has links) Le domaine biomédical est probablement le domaine où il y a les ressources les plus riches. Dans ces ressources, on regroupe les différentes expressions exprimant un concept, et définit des relations entre les concepts. Ces ressources sont construites pour faciliter l’accès aux informations dans le domaine. On pense généralement que ces ressources sont utiles pour la recherche d’information biomédicale. Or, les résultats obtenus jusqu’à présent sont mitigés : dans certaines études, l’utilisation des concepts a pu augmenter la performance de recherche, mais dans d’autres études, on a plutôt observé des baisses de performance. Cependant, ces résultats restent difficilement comparables étant donné qu’ils ont été obtenus sur des collections différentes. Il reste encore une question ouverte si et comment ces ressources peuvent aider à améliorer la recherche d’information biomédicale. Dans ce mémoire, nous comparons les différentes approches basées sur des concepts dans un même cadre, notamment l’approche utilisant les identificateurs de concept comme unité de représentation, et l’approche utilisant des expressions synonymes pour étendre la requête initiale. En comparaison avec l’approche traditionnelle de "sac de mots", nos résultats d’expérimentation montrent que la première approche dégrade toujours la performance, mais la seconde approche peut améliorer la performance. En particulier, en appariant les expressions de concepts comme des syntagmes stricts ou flexibles, certaines méthodes peuvent apporter des améliorations significatives non seulement par rapport à la méthode de "sac de mots" de base, mais aussi par rapport à la méthode de Champ Aléatoire Markov (Markov Random Field) qui est une méthode de l’état de l’art dans le domaine. Ces résultats montrent que quand les concepts sont utilisés de façon appropriée, ils peuvent grandement contribuer à améliorer la performance de recherche d’information biomédicale. Nous avons participé au laboratoire d’évaluation ShARe/CLEF 2014 eHealth. Notre résultat était le meilleur parmi tous les systèmes participants. / Health and biomedical area is probably the area where there are the richest domain resources. In these resources, different expressions are clustered into well defined concepts. They are designed to facilitate public access to the health information and are widely believed to be useful for biomedical information retrieval. However the results of previous works are highly mitigated: in some studies, concepts slightly improve the retrieval performance, while in some others degradations are observed. It is however difficult to compare the results directly due to the fact that they have been performed on different test collections. It is still unclear whether and how medical information retrieval can benefit from these knowledge resources. In this thesis we aim at comparing in the same framework two families of approaches to exploit concepts - using concept IDs as the representation units or using synonymous concept expressions to expand the original query. Compared to a traditional bag-of-words (BOW) baseline, our experiments on test collections show that concept IDs always degrades retrieval effectiveness, whereas the second approach can lead to some improvements. In particular, by matching the concept expressions as either strict or flexible phrases, some methods can lead to significant improvement over the BOW baseline and even over MRF model on most query sets. This study shows experimentally that when concepts are used in a suitable way, it can help improve the effectiveness of medical information retrieval. We participated at the ShARe/CLEF 2014 eHealth Evaluation Lab. Our result was the best among all the participating systems. UMLS MetaMap Concept Medical Information Retrieval Language Model Query Expansion Dependency Recherche d'Information Biomédical Modèle de Langue Expansion de Requête Dépendance
47	BOOKISH: Uma ferramenta para contextualização de documentos utilizando mineração de textos e expansão de consulta / BOOKISH: A tool for background documents using text mining and query expansion SILVA, Luciana Oliveira e 14 August 2009 (has links) Made available in DSpace on 2014-07-29T14:57:51Z (GMT). No. of bitstreams: 1 dissertacao Luciana Oliveira.pdf: 4515929 bytes, checksum: 79519bd2538c588dba8b9d903a04d8f4 (MD5) Previous issue date: 2009-08-14 / The continuous development of technology and its dissemination in all domains have caused significant changes in society and in education. The new global society demands new skills and provides an opportunity to introduce new technologies into the educational process, improving traditional education systems. The focus should be on the search for information, significant research, and on the development of projects, rather than on the pure transmission of content. When delivering a lecture about a given content, teachers often provide additional sources that will help students deepen their understanding of the subject and carry out activities. Furthermore, it is desirable to have proactive students, capable of interpreting and identifying other sources of information that complement and expand the subject being studied. However, one of the challenges today is information overload - there are many documents available and few effective ways to treat them. Every day, large numbers of documents are stored and made available. These documents contain a lot of relevant information. However finding that knowledge is a difficult task. The BOOKISH system, proposed in this work, assists students in their search activity. Analyzing PowerPoint slide presentations, the tool identifies contextually similar electronic documents, minimizing the time spent in searching for additional relevant material and directing the student to the content he needs. The tool presented in this document uses text mining techniques and automatic query expansion. / O contínuo desenvolvimento da tecnologia e sua disseminação em todas as áreas têm provocado mudanças significativas na sociedade e na educação. É preciso buscar a formação necessária às novas competências do mundo globalizado e considerar que o momento proporciona uma oportunidade de aproximar novas tecnologias ao processo educativo como possibilidade de melhorar os sistemas de ensino tradicionais. O foco deve ser a busca da informação significativa e da pesquisa, o desenvolvimento de projetos e não predominantemente a simples transmissão de conteúdo. Ao ministrar conteúdo de determinada disciplina, o professor muitas vezes disponibiliza fontes complementares que ajudam na compreensão do tema e auxiliam os alunos na execução de atividades. Já o aluno, dentro de uma abordagem pró-ativa, deve ser capaz de interpretar e identificar outras fontes que melhor complementem e expandam assunto. No entanto, um dos desafios atuais é a sobrecarga de informação - são muitos documentos à disposição e poucas formas eficientes de tratá-los. O sistema BOOKISH, proposto neste trabalho, busca auxiliar os alunos na atividade de identificar e filtrar informações relevantes e dentro do contexto que está sendo estudado em sala de aula. A partir de apresentações em forma de slides disponibilizados pelos professores, a ferramenta identifica documentos eletrônicos contextualmente semelhantes e os disponibiliza para os alunos. É objetivo minimizar o tempo gasto nas atividades de busca por material complementar relevante e direcionar o aluno para o conteúdo do qual necessita. A ferramenta apresentada neste trabalho utiliza técnicas de mineração de textos e expansão automática de consultas com esta finalidade. Mineração de Textos Expansão de Consulta Text Mining Query Expansion
48	Systematisierung und Evaluierung von Clustering-Verfahren im Information Retrieval Kürsten, Jens 02 November 2006 (has links) Im Rahmen der vorliegenden Diplomarbeit werden Verfahren zur Clusteranalyse sowie deren Anwendungsmöglichkeiten zur Optimierung der Rechercheergebnisse von Information Retrievalsystemen untersucht. Die Grundlage der vergleichenden Evaluation erfolgversprechender Ansätze zur Clusteranalyse anhand der Domain Specific Monolingual Tasks des Cross-Language Evaluation Forums 2006 bildet die systematische Analyse der in der Forschung etablierten Verfahren zur Clusteranalyse. Die Implementierung ausgewählter Clusterverfahren wird innerhalb eines bestehenden, Lucene-basierten Retrievalsystems durchgeführt. Zusätzlich wird dieses System im Rahmen dieser Arbeit mit Komponenten zur Query Expansion und zur Datenfusion ausgestattet. Diese beiden Ansätze haben sich in der Forschung zur automatischen Optimierung von Retrievalergebnissen durchgesetzt und bilden daher die Bewertungsgrundlage für die implementierten Konzepte zur Optimierung von Rechercheergebnissen auf Basis der Clusteranalyse. Im Ergebnis erweist sich das lokale Dokument Clustering auf Basis des k-means Clustering-Algorithmus in Kombination mit dem Pseudo-Relevanz-Feedback Ansatz zur Selektion der Dokumente für die Query Expansion als besonders erfolgversprechend. Darüber hinaus wird gezeigt, dass mit Hilfe der Datenfusion auf Basis des Z-Score Operators die Ergebnisse verschiedener Indizierungsverfahren so kombiniert werden können, dass sehr gute und insbesondere sehr robuste Rechercheergebnisse erreicht werden. / Within the present diploma thesis, widely used Cluster Analysis approaches are studied in respect to their application to optimize the results of Information Retrieval systems. A systematic analysis of approved methods of the Cluster Analysis is the basis of the comparative evaluation of promising approaches to use the Cluster Analysis to optimize retrieval results. The evaluation is accomplished by the participation at the Domain Specific Monolingual Tasks of the Cross-Language Evaluation Forum 2006. The implementation of selected approaches for Clustering is realized within the framework of an existing Lucene-based retrieval system. Within the scope of work, this system will be supplemented with components for Query Expansion and Data Fusion. Both approaches have prevailed in the research of automatic optimization of retrieval results. Therefore, they are the basis of assessment of the implemented methods, which aim at improving the results of retrieval and are based on Cluster Analysis. The results show that selecting documents for Query Expansion with the help of local Document Clustering based on the k-means Clustering algorithm combined with the Blind Feedback approach is very promising. Furthermore, the Data Fusion approach based on the Z-Score operator proves to be very useful to combine retrieval results of different indexing methods. In fact, this approach achieves very good and in particular very robust results of retrieval. info:eu-repo/classification/ddc/000 ddc:000 info:eu-repo/classification/ddc/004 ddc:004 info:eu-repo/classification/ddc/020 ddc:020 Cluster-Analyse Evaluation Information Retrieval Datenfusion/Merging Optimierung von Rechercheergebnissen Query Expansion

Search results