• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 16
  • 9
  • 2
  • 1
  • 1
  • Tagged with
  • 33
  • 33
  • 18
  • 13
  • 11
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Towards Collaborative Session-based Semantic Search

Straub, Sebastian 11 October 2017 (has links) (PDF)
In recent years, the most popular web search engines have excelled in their ability to answer short queries that require clear, localized and personalized answers. When it comes to complex exploratory search tasks however, the main challenge for the searcher remains the same as back in the 1990s: Trying to formulate a single query that contains all the right keywords to produce at least some relevant results. In this work we want to investigate new ways to facilitate exploratory search by making use of context information from the user's entire search process. Therefore we present the concept of session-based semantic search, with an optional extension to collaborative search scenarios. To improve the relevance of search results we expand queries with terms from the user's recent query history in the same search context (session-based search). We introduce a novel method for query classification based on statistical topic models which allows us to track the most important topics in a search session so that we can suggest relevant documents that could not be found through keyword matching. To demonstrate the potential of these concepts, we have built the prototype of a session-based semantic search engine which we release as free and open source software. In a qualitative user study that we have conducted, this prototype has shown promising results and was well-received by the participants. / Die führenden Web-Suchmaschinen haben sich in den letzten Jahren gegenseitig darin übertroffen, möglichst leicht verständliche, lokalisierte und personalisierte Antworten auf kurze Suchanfragen anzubieten. Bei komplexen explorativen Rechercheaufgaben hingegen ist die größte Herausforderung für den Nutzer immer noch die gleiche wie in den 1990er Jahren: Eine einzige Suchanfrage so zu formulieren, dass alle notwendigen Schlüsselwörter enthalten sind, um zumindest ein paar relevante Ergebnisse zu erhalten. In der vorliegenden Arbeit sollen neue Methoden entwickelt werden, um die explorative Suche zu erleichtern, indem Kontextinformationen aus dem gesamten Suchprozess des Nutzers einbezogen werden. Daher stellen wir das Konzept der sitzungsbasierten semantischen Suche vor, mit einer optionalen Erweiterung auf kollaborative Suchszenarien. Um die Relevanz von Suchergebnissen zu steigern, werden Suchanfragen mit Begriffen aus den letzten Anfragen des Nutzers angereichert, die im selben Suchkontext gestellt wurden (sitzungsbasierte Suche). Außerdem wird ein neuartiger Ansatz zur Klassifizierung von Suchanfragen eingeführt, der auf statistischen Themenmodellen basiert und es uns ermöglicht, die wichtigsten Themen in einer Suchsitzung zu erkennen, um damit weitere relevante Dokumente vorzuschlagen, die nicht durch Keyword-Matching gefunden werden konnten. Um das Potential dieser Konzepte zu demonstrieren, wurde im Rahmen dieser Arbeit der Prototyp einer sitzungsbasierten semantischen Suchmaschine entwickelt, den wir als freie Software veröffentlichen. In einer qualitativen Nutzerstudie hat dieser Prototyp vielversprechende Ergebnisse hervorgebracht und wurde von den Teilnehmern positiv aufgenommen.
22

Word Embeddings in Database Systems

Günther, Michael 18 November 2021 (has links)
Research in natural language processing (NLP) focuses recently on the development of learned language models called word embedding models like word2vec, fastText, and BERT. Pre-trained on large amounts of unstructured text in natural language, those embedding models constitute a rich source of common knowledge in the domain of the text used for the training. In the NLP community, significant improvements are achieved by using those models together with deep neural network models. To support applications to benefit from word embeddings, we extend the capabilities of traditional relational database systems, which are still by far the most common DBMSs but only provide limited text analysis features. Therefore, we implement (a) novel database operations involving embedding representations to allow a database user to exploit the knowledge encoded in word embedding models for advanced text analysis operations. The integration of those operations into database query language enables users to construct queries using novel word embedding operations in conjunction with traditional query capabilities of SQL. To allow efficient retrieval of embedding representations and fast execution of the operations, we implement (b) novel search algorithms and index structures for approximated kNN-Joins and integrate those into a relational database management system. Moreover, we investigate techniques to optimize embedding representations of text values in database systems. Therefore, we design (c) a novel context adaptation algorithm. This algorithm utilizes the structured data present in the database to enrich the embedding representations of text values to model their context-specific semantic in the database. Besides, we provide (d) support for selecting a word embedding model suitable for a user's application. Therefore, we developed a data processing pipeline to construct a dataset for domain-specific word embedding evaluation. Finally, we propose (e) novel embedding techniques for pre-training on tabular data to support applications working with text values in tables. Our proposed embedding techniques model semantic relations arising from the alignment of words in tabular layouts that can only hardly be derived from text documents, e.g., relations between table schema and table body. In this way, many applications, which either employ embeddings in supervised machine learning models, e.g., to classify cells in spreadsheets, or through the application of arithmetic operations, e.g., table discovery applications, can profit from the proposed embedding techniques.:1 INTRODUCTION 1.1 Contribution 1.2 Outline 2 REPRESENTATION OF TEXT FOR NATURAL LANGUAGE PROCESSING 2.1 Natural Language Processing Systems 2.2 Word Embedding Models 2.2.1 Matrix Factorization Methods 2.2.2 Learned Distributed Representations 2.2.3 Contextualize Word Embeddings 2.2.4 Advantages of Contextualize and Static Word Embeddings 2.2.5 Properties of Static Word Embeddings 2.2.6 Node Embeddings 2.2.7 Non-Euclidean Embedding Techniques 2.3 Evaluation of Word Embeddings 2.3.1 Similarity Evaluation 2.3.2 Analogy Evaluation 2.3.3 Cluster-based Evaluation 2.4 Application for Tabular Data 2.4.1 Semantic Search 2.4.2 Data Curation 2.4.3 Data Discovery 3 SYSTEM OVERVIEW 3.1 Opportunities of an Integration 3.2 Characteristics of Word Vectors 3.3 Objectives and Challenges 3.4 Word Embedding Operations 3.5 Performance Optimization of Operations 3.6 Context Adaptation 3.7 Requirements for Model Recommendation 3.8 Tabular Embedding Models 4 MANAGEMENT OF EMBEDDING REPRESENTATIONS IN DATABASE SYSTEMS 4.1 Integration of Operations in an RDBMS 4.1.1 System Architecture 4.1.2 Storage Formats 4.1.3 User-Defined Functions 4.1.4 Web Application 4.2 Nearest Neighbor Search 4.2.1 Tree-based Methods 4.2.2 Proximity Graphs 4.2.3 Locality-Sensitive Hashing 4.2.4 Quantization Techniques 4.3 Applicability of ANN Techniques for Word Embedding kNN-Joins 4.4 Related Work on kNN Search in Database Systems 4.5 ANN-Joins for Relational Database Systems 4.5.1 Index Architecture 4.5.2 Search Algorithm 4.5.3 Distance Calculation 4.5.4 Optimization Capabilities 4.5.5 Estimation of the Number of Targets 4.5.6 Flexible Product Quantization 4.5.7 Further Optimizations 4.5.8 Parameter Tuning 4.5.9 kNN-Joins for Word2Bits 4.6 Evaluation 4.6.1 Experimental Setup 4.6.2 Influence of Index Parameters on Precision and Execution Time 4.6.3 Performance of Subroutines 4.6.4 Flexible Product Quantization 4.6.5 Accuracy of the Target Size Estimation 4.6.6 Performance of Word2Bits kNN-Join 4.7 Summary 5 CONTEXT ADAPTATION FOR WORD EMBEDDING OPTIMIZATION 5.1 Related Work 5.1.1 Graph and Text Joint Embedding Methods 5.1.2 Retrofitting Approaches 5.1.3 Table Embedding Models 5.2 Relational Retrofitting Approach 5.2.1 Data Preparation 5.2.2 Relational Retrofitting Problem 5.2.3 Relational Retrofitting Algorithm 5.2.4 Online-RETRO 5.3 Evaluation Platform: Retro Live 5.3.1 Functionality 5.3.2 Interface 5.4 Evaluation 5.4.1 Datasets 5.4.2 Training of Embeddings 5.4.3 Machine Learning Models 5.4.4 Evaluation of ML Models 5.4.5 Run-time Measurements 5.4.6 Online Retrofitting 5.5 Summary 6 MODEL RECOMMENDATION 6.1 Related Work 6.1.1 Extrinsic Evaluation 6.1.2 Intrinsic Evaluation 6.2 Architecture of FacetE 6.3 Evaluation Dataset Construction Pipeline 6.3.1 Web Table Filtering and Facet Candidate Generation 6.3.2 Check Soft Functional Dependencies 6.3.3 Post-Filtering 6.3.4 Categorization 6.4 Evaluation of Popular Word Embedding Models 6.4.1 Domain-Agnostic Evaluation 6.4.2 Evaluation of a Single Facet 6.4.3 Evaluation of an Object Set 6.5 Summary 7 TABULAR TEXT EMBEDDINGS 7.1 Related Work 7.1.1 Static Table Embedding Models 7.1.2 Contextualized Table Embedding Models 7.2 Web Table Embedding Model 7.2.1 Preprocessing 7.2.2 Text Serialization 7.2.3 Encoding Model 7.2.4 Embedding Training 7.3 Applications for Table Embeddings 7.3.1 Table Union Search 7.3.2 Classification Tasks 7.4 Evaluation 7.4.1 Intrinsic Evaluation 7.4.2 Table Union Search Evaluation 7.4.3 Table Layout Classification 7.4.4 Spreadsheet Cell Classification 7.5 Summary 8 CONCLUSION 8.1 Summary 8.2 Directions for Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES A CONVEXITY OF RELATIONAL RETROFITTING B EVALUATION OF THE RELATIONAL RETROFITTING HYPERPARAMETERS
23

Towards Collaborative Session-based Semantic Search

Straub, Sebastian 11 October 2017 (has links)
In recent years, the most popular web search engines have excelled in their ability to answer short queries that require clear, localized and personalized answers. When it comes to complex exploratory search tasks however, the main challenge for the searcher remains the same as back in the 1990s: Trying to formulate a single query that contains all the right keywords to produce at least some relevant results. In this work we want to investigate new ways to facilitate exploratory search by making use of context information from the user's entire search process. Therefore we present the concept of session-based semantic search, with an optional extension to collaborative search scenarios. To improve the relevance of search results we expand queries with terms from the user's recent query history in the same search context (session-based search). We introduce a novel method for query classification based on statistical topic models which allows us to track the most important topics in a search session so that we can suggest relevant documents that could not be found through keyword matching. To demonstrate the potential of these concepts, we have built the prototype of a session-based semantic search engine which we release as free and open source software. In a qualitative user study that we have conducted, this prototype has shown promising results and was well-received by the participants.:1. Introduction 2. Related Work 2.1. Topic Models 2.1.1. Common Traits 2.1.2. Topic Modeling Techniques 2.1.3. Topic Labeling 2.1.4. Topic Graph Visualization 2.2. Session-based Search 2.3. Query Classification 2.4. Collaborative Search 2.4.1. Aspects of Collaborative Search Systems 2.4.2. Collaborative Information Retrieval Systems 3. Core Concepts 3.1. Session-based Search 3.1.1. Session Data 3.1.2. Query Aggregation 3.2. Topic Centroid 3.2.1. Topic Identification 3.2.2. Topic Shift 3.2.3. Relevance Feedback 3.2.4. Topic Graph Visualization 3.3. Search Strategy 3.3.1. Prerequisites 3.3.2. Search Algorithms 3.3.3. Query Pipeline 3.4. Collaborative Search 3.4.1. Shared Topic Centroid 3.4.2. Group Management 3.4.3. Collaboration 3.5. Discussion 4. Prototype 4.1. Document Collection 4.1.1. Selection Criteria 4.1.2. Data Preparation 4.1.3. Search Index 4.2. Search Engine 4.2.1. Search Algorithms 4.2.2. Query Pipeline 4.2.3. Session Persistence 4.3. User Interface 4.4. Performance Review 4.5. Discussion 5. User Study 5.1. Methods 5.1.1. Procedure 5.1.2. Implementation 5.1.3. Tasks 5.1.4. Questionnaires 5.2. Results 5.2.1. Participants 5.2.2. Task Review 5.2.3. Literature Research Results 5.3. Discussion 6. Conclusion Bibliography Weblinks A. Appendix A.1. Prototype: Source Code A.2. Survey A.2.1. Tasks A.2.2. Document Filter for Google Scholar A.2.3. Questionnaires A.2.4. Participant’s Answers A.2.5. Participant’s Search Results / Die führenden Web-Suchmaschinen haben sich in den letzten Jahren gegenseitig darin übertroffen, möglichst leicht verständliche, lokalisierte und personalisierte Antworten auf kurze Suchanfragen anzubieten. Bei komplexen explorativen Rechercheaufgaben hingegen ist die größte Herausforderung für den Nutzer immer noch die gleiche wie in den 1990er Jahren: Eine einzige Suchanfrage so zu formulieren, dass alle notwendigen Schlüsselwörter enthalten sind, um zumindest ein paar relevante Ergebnisse zu erhalten. In der vorliegenden Arbeit sollen neue Methoden entwickelt werden, um die explorative Suche zu erleichtern, indem Kontextinformationen aus dem gesamten Suchprozess des Nutzers einbezogen werden. Daher stellen wir das Konzept der sitzungsbasierten semantischen Suche vor, mit einer optionalen Erweiterung auf kollaborative Suchszenarien. Um die Relevanz von Suchergebnissen zu steigern, werden Suchanfragen mit Begriffen aus den letzten Anfragen des Nutzers angereichert, die im selben Suchkontext gestellt wurden (sitzungsbasierte Suche). Außerdem wird ein neuartiger Ansatz zur Klassifizierung von Suchanfragen eingeführt, der auf statistischen Themenmodellen basiert und es uns ermöglicht, die wichtigsten Themen in einer Suchsitzung zu erkennen, um damit weitere relevante Dokumente vorzuschlagen, die nicht durch Keyword-Matching gefunden werden konnten. Um das Potential dieser Konzepte zu demonstrieren, wurde im Rahmen dieser Arbeit der Prototyp einer sitzungsbasierten semantischen Suchmaschine entwickelt, den wir als freie Software veröffentlichen. In einer qualitativen Nutzerstudie hat dieser Prototyp vielversprechende Ergebnisse hervorgebracht und wurde von den Teilnehmern positiv aufgenommen.:1. Introduction 2. Related Work 2.1. Topic Models 2.1.1. Common Traits 2.1.2. Topic Modeling Techniques 2.1.3. Topic Labeling 2.1.4. Topic Graph Visualization 2.2. Session-based Search 2.3. Query Classification 2.4. Collaborative Search 2.4.1. Aspects of Collaborative Search Systems 2.4.2. Collaborative Information Retrieval Systems 3. Core Concepts 3.1. Session-based Search 3.1.1. Session Data 3.1.2. Query Aggregation 3.2. Topic Centroid 3.2.1. Topic Identification 3.2.2. Topic Shift 3.2.3. Relevance Feedback 3.2.4. Topic Graph Visualization 3.3. Search Strategy 3.3.1. Prerequisites 3.3.2. Search Algorithms 3.3.3. Query Pipeline 3.4. Collaborative Search 3.4.1. Shared Topic Centroid 3.4.2. Group Management 3.4.3. Collaboration 3.5. Discussion 4. Prototype 4.1. Document Collection 4.1.1. Selection Criteria 4.1.2. Data Preparation 4.1.3. Search Index 4.2. Search Engine 4.2.1. Search Algorithms 4.2.2. Query Pipeline 4.2.3. Session Persistence 4.3. User Interface 4.4. Performance Review 4.5. Discussion 5. User Study 5.1. Methods 5.1.1. Procedure 5.1.2. Implementation 5.1.3. Tasks 5.1.4. Questionnaires 5.2. Results 5.2.1. Participants 5.2.2. Task Review 5.2.3. Literature Research Results 5.3. Discussion 6. Conclusion Bibliography Weblinks A. Appendix A.1. Prototype: Source Code A.2. Survey A.2.1. Tasks A.2.2. Document Filter for Google Scholar A.2.3. Questionnaires A.2.4. Participant’s Answers A.2.5. Participant’s Search Results
24

Trade-offs between Quality and Efficiency in Multilingual Dense Retrieval / Avvägningar mellan kvalitet och effektivitet i f lerspråkig tät informationssökning

Schüldt, Emma January 2022 (has links)
As the amount of content online grows, information retrieval becomes increasingly crucial. Traditional information retrieval does not take the text order into account and is also dependent on exact text matching between the query and the document. Therefore, a query consisting of synonyms to words in a document will not retrieve that document even if it could have been relevant to the user. An alternative approach is dense retrieval which solves these issues by representing the semantic meaning of the query or document using a vector representation. Semantically similar queries and documents are represented with vectors close to each other in a vector space. Vector similarity search can be used to find the most relevant documents for a query. Since the semantic meanings of the words are used, synonyms and paraphrases are handled implicitly. There are several ways to design these representation vectors, either by using one or several vectors to represent each query or document, by changing the dimensionality of the vectors, or by changing the span of values in the vectors. Each option brings its trade-offs in terms of quality of search results, query latency, and index memory footprint. This study experimented with each of the alternatives above. Since most previous research within the area has been done in a monolingual, mainly English context, this study used four different languages to investigate if the trade-offs differed. In this study, the quality, latency, and memory footprint moved in the same direction, i.e., when the quality increased, then the latency increased as well. This was the case for all the languages. For the version that used one vector each for the document and query, decreasing the dimensionality to 128 or 64 gave significant latency improvements but did not affect the quality. For the larger version, which used 32 vectors for the query and 64 for the document, converting the values of vectors to binary had no significant effect on quality but greatly reduced the storage size. / Mängden innehåll på internet växer, och med det behovet av välfungerande informationssökningssystem. Traditionella sökmotorer tar inte hänsyn till ordföljden och är beroende av exakt textmatchning mellan sökfrågan och dokumentet. På grund av detta kommer en sökfråga som innehåller synonymer till ord i ett dokument inte att hämta det dokumentet, även om det hade kunnat vara relevant för användaren. En annan metod är tät informationssökning (en: Dense Retrieval) som löser de här problemen implicit genom att representera den semantiska betydelsen av sökfrågan eller dokumentet med en vektorrepresentation. Semantiskt lika sökfrågor och dokument representeras av närliggande vektorer i ett vektorrum. Likhetssökning med vektorerna kan användas för att hitta de mest relevanta dokumenten för en sökfråga. Eftersom ordens semantiska betydelse används, hanteras synonymer och parafraser implicit. Det finns flera sätt att utforma vektorerna, antingen genom att använda en eller flera vektorer för att representera varje sökfråga eller dokument, genom att ändra vektorernas dimensionalitet, eller genom att ändra spannet för vektorernas värden. Varje alternativ har sina egna för- och nackdelar med avseende på sökresultatens kvalitet, sökningarnas tidsåtgång, och hur mycket minne indexet upptar. I den här studien har vi undersökt alla ovanstående aspekter. Eftersom den mesta tidigare forskningen enbart har gjorts i en engelsk kontext, använder den här studien fyra olika språk för att se om föroch nackdelarna skiljde sig åt mellan de olika språken. I den här studien rörde sig kvaliteten, söktiden och minnesavtrycket i samma riktning, det vill säga när kvaliteten ökade, ökade också söktiden. Detta gällde för alla olika språk. För versionen som använde en vektor vardera för sökfrågan och dokumentet, gav en minskning av dimensionaliteten till 128 eller 64 betydande minskningar av söktiden men förändrade inte kvaliteten. För den större version som använde 32 vektorer för sökfrågan och 64 för dokumentet, gjorde inte en omvandling av vektorernas värden till binära någon skillnad för kvaliteten, men minskade lagringsutrymmet betydligt.
25

ResearchIQ: An End-To-End Semantic Knowledge Platform For Resource Discovery in Biomedical Research

Raje, Satyajeet 20 December 2012 (has links)
No description available.
26

Vers un meilleur accès aux informations pertinentes à l’aide du Web sémantique : application au domaine du e-tourisme / Towards a better access to relevant information with Semantic Web : application to the e-tourism domain

Lully, Vincent 17 December 2018 (has links)
Cette thèse part du constat qu’il y a une infobésité croissante sur le Web. Les deux types d’outils principaux, à savoir le système de recherche et celui de recommandation, qui sont conçus pour nous aider à explorer les données du Web, connaissent plusieurs problématiques dans : (1) l’assistance de la manifestation des besoins d’informations explicites, (2) la sélection des documents pertinents, et (3) la mise en valeur des documents sélectionnés. Nous proposons des approches mobilisant les technologies du Web sémantique afin de pallier à ces problématiques et d’améliorer l’accès aux informations pertinentes. Nous avons notamment proposé : (1) une approche sémantique d’auto-complétion qui aide les utilisateurs à formuler des requêtes de recherche plus longues et plus riches, (2) des approches de recommandation utilisant des liens hiérarchiques et transversaux des graphes de connaissances pour améliorer la pertinence, (3) un framework d’affinité sémantique pour intégrer des données sémantiques et sociales pour parvenir à des recommandations qualitativement équilibrées en termes de pertinence, diversité et nouveauté, (4) des approches sémantiques visant à améliorer la pertinence, l’intelligibilité et la convivialité des explications des recommandations, (5) deux approches de profilage sémantique utilisateur à partir des images, et (6) une approche de sélection des meilleures images pour accompagner les documents recommandés dans les bannières de recommandation. Nous avons implémenté et appliqué nos approches dans le domaine du e-tourisme. Elles ont été dûment évaluées quantitativement avec des jeux de données vérité terrain et qualitativement à travers des études utilisateurs. / This thesis starts with the observation that there is an increasing infobesity on the Web. The two main types of tools, namely the search engine and the recommender system, which are designed to help us explore the Web data, have several problems: (1) in helping users express their explicit information needs, (2) in selecting relevant documents, and (3) in valuing the selected documents. We propose several approaches using Semantic Web technologies to remedy these problems and to improve the access to relevant information. We propose particularly: (1) a semantic auto-completion approach which helps users formulate longer and richer search queries, (2) several recommendation approaches using the hierarchical and transversal links in knowledge graphs to improve the relevance of the recommendations, (3) a semantic affinity framework to integrate semantic and social data to yield qualitatively balanced recommendations in terms of relevance, diversity and novelty, (4) several recommendation explanation approaches aiming at improving the relevance, the intelligibility and the user-friendliness, (5) two image user profiling approaches and (6) an approach which selects the best images to accompany the recommended documents in recommendation banners. We implemented and applied our approaches in the e-tourism domain. They have been properly evaluated quantitatively with ground-truth datasets and qualitatively through user studies.
27

MSSearch: busca semântica de objetos de aprendizagem OBAA com suporte a alinhamento automático de ontologias

Silva, Luiz Rodrigo Jardim da 27 March 2013 (has links)
Submitted by Maicon Juliano Schmidt (maicons) on 2015-07-09T14:56:04Z No. of bitstreams: 1 Luiz Rodrigo Jardim da Silva.pdf: 2565431 bytes, checksum: 6a2df89b794e9afe09546769e43ef4e9 (MD5) / Made available in DSpace on 2015-07-09T14:56:04Z (GMT). No. of bitstreams: 1 Luiz Rodrigo Jardim da Silva.pdf: 2565431 bytes, checksum: 6a2df89b794e9afe09546769e43ef4e9 (MD5) Previous issue date: 2013-01-31 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Problemas relacionados à heterogeneidade semântica vêm se mostrando atualmente como um importante campo de pesquisa. Dentro do contexto educacional, pesquisadores têm se dedicado ao desenvolvimento de novas tecnologias que visam melhorar os processos de localização, recuperação, catalogação, e reutilização de objetos de aprendizagem. Baseado neste cenário, destaca-se o uso de técnicas de alinhamento de ontologias para prover integração entre ontologias distintas. Assim, o objetivo deste trabalho é desenvolver uma ferramenta que forneça mecanismos de busca semântica de objetos de aprendizagem com suporte a alinhamento automático de ontologias. / Semantics heterogeneity problems are becoming an important field of research. Within the educational context, researchers have focused on developing new technologies to improve the processes of localization, retrieval, cataloging, and reuse of learning objects. This scenario highlights the use of ontology alignment techniques to provide integration between different ontologies. Therefore, the goal of the present work is to develop a tool that provides mechanisms for semantic search of learning objects, with support for automatic aligning ontologies.
28

Socio-semantic conversational information access

Sahay, Saurav 15 November 2011 (has links)
The main contributions of this thesis revolve around development of an integrated conversational recommendation system, combining data and information models with community network and interactions to leverage multi-modal information access. We have developed a real time conversational information access community agent that leverages community knowledge by pushing relevant recommendations to users of the community. The recommendations are delivered in the form of web resources, past conversation and people to connect to. The information agent (cobot, for community/ collaborative bot) monitors the community conversations, and is 'aware' of users' preferences by implicitly capturing their short term and long term knowledge models from conversations. The agent leverages from health and medical domain knowledge to extract concepts, associations and relationships between concepts; formulates queries for semantic search and provides socio-semantic recommendations in the conversation after applying various relevance filters to the candidate results. The agent also takes into account users' verbal intentions in conversations while making recommendation decision. One of the goals of this thesis is to develop an innovative approach to delivering relevant information using a combination of social networking, information aggregation, semantic search and recommendation techniques. The idea is to facilitate timely and relevant social information access by mixing past community specific conversational knowledge and web information access to recommend and connect users with relevant information. Language and interaction creates usable memories, useful for making decisions about what actions to take and what information to retain. Cobot leverages these interactions to maintain users' episodic and long term semantic models. The agent analyzes these memory structures to match and recommend users in conversations by matching with the contextual information need. The social feedback on the recommendations is registered in the system for the algorithms to promote community preferred, contextually relevant resources. The nodes of the semantic memory are frequent concepts extracted from user's interactions. The concepts are connected with associations that develop when concepts co-occur frequently. Over a period of time when the user participates in more interactions, new concepts are added to the semantic memory. Different conversational facets are matched with episodic memories and a spreading activation search on the semantic net is performed for generating the top candidate user recommendations for the conversation. The tying themes in this thesis revolve around informational and social aspects of a unified information access architecture that integrates semantic extraction and indexing with user modeling and recommendations.
29

GoPubMed: Ontology-based literature search for the life sciences / GoPubMed: ontologie-basierte Literatursuche für die Lebenswissenschaften

Doms, Andreas 20 January 2009 (has links) (PDF)
Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.
30

GoWeb: Semantic Search and Browsing for the Life Sciences

Dietze, Heiko 21 December 2010 (has links) (PDF)
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%. Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services. To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering. Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system.

Page generated in 0.0424 seconds