Spelling suggestions: "subject:"semantische suche"" "subject:"semantische buche""
1 |
An e-librarian service : supporting explorative learning by a description logics based semantic retrieval toolLinckels, Serge January 2008 (has links)
Although educational content in electronic form is increasing dramatically, its usage in an educational environment is poor, mainly due to the fact that there is too much of (unreliable) redundant, and not relevant information. Finding appropriate answers is a rather difficult task being reliant on the user filtering of the pertinent information from the noise. Turning knowledge bases like the online tele-TASK archive into useful educational resources requires identifying correct, reliable, and "machine-understandable" information, as well as developing simple but efficient search tools with the ability to reason over this information.
Our vision is to create an E-Librarian Service, which is able to retrieve multimedia resources from a knowledge base in a more efficient way than by browsing through an index, or by using a simple keyword search. In our E-Librarian Service, the user can enter his question in a very simple and human way; in natural language (NL). Our premise is that more pertinent results would be retrieved if the search engine understood the sense of the user's query. The returned results are then logical consequences of an inference rather than of keyword matchings. Our E-Librarian Service does not return the answer to the user's question, but it retrieves the most pertinent document(s), in which the user finds the answer to his/her question.
Among all the documents that have some common information with the user query, our E-Librarian Service identifies the most pertinent match(es), keeping in mind that the user expects an exhaustive answer while preferring a concise answer with only little or no information overhead. Also, our E-Librarian Service always proposes a solution to the user, even if the system concludes that there is no exhaustive answer.
Our E-Librarian Service was implemented prototypically in three different educational tools. A first prototype is CHESt (Computer History Expert System); it has a knowledge base with 300 multimedia clips that cover the main events in computer history. A second prototype is MatES (Mathematics Expert System); it has a knowledge base with 115 clips that cover the topic of fractions in mathematics for secondary school w.r.t. the official school programme. All clips were recorded mainly by pupils. The third and most advanced prototype is the "Lecture Butler's E-Librarain Service"; it has a Web service interface to respect a service oriented architecture (SOA), and was developed in the context of the Web-University project at the Hasso-Plattner-Institute (HPI).
Two major experiments in an educational environment - at the Lycée Technique Esch/Alzette in Luxembourg - were made to test the pertinence and reliability of our E-Librarian Service as a complement to traditional courses. The first experiment (in 2005) was made with CHESt in different classes, and covered a single lesson. The second experiment (in 2006) covered a period of 6 weeks of intensive use of MatES in one class. There was no classical mathematics lesson where the teacher gave explanations, but the students had to learn in an autonomous and exploratory way. They had to ask questions to the E-Librarian Service just the way they would if there was a human teacher. / Obwohl sich die Verfügbarkeit von pädagogischen Inhalten in elektronischer Form stetig erhöht, ist deren Nutzen in einem schulischen Umfeld recht gering. Die Hauptursache dessen ist, dass es zu viele unzuverlässige, redundante und nicht relevante Informationen gibt. Das Finden von passenden Lernobjekten ist eine schwierige Aufgabe, die vom benutzerbasierten Filtern der passenden Informationen abhängig ist. Damit Wissensbanken wie das online Tele-TASK Archiv zu nützlichen, pädagogischen Ressourcen werden, müssen Lernobjekte korrekt, zuverlässig und in maschinenverständlicher Form identifiziert werden, sowie effiziente Suchwerkzeuge entwickelt werden.
Unser Ziel ist es, einen E-Bibliothekar-Dienst zu schaffen, der multimediale Ressourcen in einer Wissensbank auf effizientere Art und Weise findet als mittels Navigieren durch ein Inhaltsverzeichnis oder mithilfe einer einfachen Stichwortsuche. Unsere Prämisse ist, dass passendere Ergebnisse gefunden werden könnten, wenn die semantische Suchmaschine den Sinn der Benutzeranfrage verstehen würde. In diesem Fall wären die gelieferten Antworten logische Konsequenzen einer Inferenz und nicht die einer Schlüsselwortsuche.
Tests haben gezeigt, dass unser E-Bibliothekar-Dienst unter allen Dokumenten in einer gegebenen Wissensbank diejenigen findet, die semantisch am besten zur Anfrage des Benutzers passen. Dabei gilt, dass der Benutzer eine vollständige und präzise Antwort erwartet, die keine oder nur wenige Zusatzinformationen enthält. Außerdem ist unser System in der Lage, dem Benutzer die Qualität und Pertinenz der gelieferten Antworten zu quantifizieren und zu veranschaulichen. Schlussendlich liefert unser E-Bibliothekar-Dienst dem Benutzer immer eine Antwort, selbst wenn das System feststellt, dass es keine vollständige Antwort auf die Frage gibt.
Unser E-Bibliothekar-Dienst ermöglicht es dem Benutzer, seine Fragen in einer sehr einfachen und menschlichen Art und Weise auszudrücken, nämlich in natürlicher Sprache. Linguistische Informationen und ein gegebener Kontext in Form einer Ontologie werden für die semantische Übersetzung der Benutzereingabe in eine logische Form benutzt.
Unser E-Bibliothekar-Dienst wurde prototypisch in drei unterschiedliche pädagogische Werkzeuge umgesetzt. In zwei Experimenten wurde in einem pädagogischen Umfeld die Angemessenheit und die Zuverlässigkeit dieser Werkzeuge als Komplement zum klassischen Unterricht geprüft. Die Hauptergebnisse sind folgende:
Erstens wurde festgestellt, dass Schüler generell akzeptieren, ganze Fragen einzugeben - anstelle von Stichwörtern - wenn dies ihnen hilft, bessere Suchresultate zu erhalten.
Zweitens, das wichtigste Resultat aus den Experimenten ist die Erkenntnis, dass Schuleresultate verbessert werden können, wenn Schüler unseren E-Bibliothekar-Dienst verwenden. Wir haben eine generelle Verbesserung von 5% der Schulresultate gemessen. 50% der Schüler haben ihre Schulnoten verbessert, 41% von ihnen sogar maßgeblich.
Einer der Hauptgründe für diese positiven Resultate ist, dass die Schüler motivierter waren und folglich bereit waren, mehr Einsatz und Fleiß in das Lernen und in das Erwerben von neuem Wissen zu investieren.
|
2 |
Towards Collaborative Session-based Semantic SearchStraub, Sebastian 11 October 2017 (has links) (PDF)
In recent years, the most popular web search engines have excelled in their ability to answer short queries that require clear, localized and personalized answers. When it comes to complex exploratory search tasks however, the main challenge for the searcher remains the same as back in the 1990s: Trying to formulate a single query that contains all the right keywords to produce at least some relevant results.
In this work we want to investigate new ways to facilitate exploratory search by making use of context information from the user's entire search process. Therefore we present the concept of session-based semantic search, with an optional extension to collaborative search scenarios. To improve the relevance of search results we expand queries with terms from the user's recent query history in the same search context (session-based search). We introduce a novel method for query classification based on statistical topic models which allows us to track the most important topics in a search session so that we can suggest relevant documents that could not be found through keyword matching.
To demonstrate the potential of these concepts, we have built the prototype of a session-based semantic search engine which we release as free and open source software. In a qualitative user study that we have conducted, this prototype has shown promising results and was well-received by the participants. / Die führenden Web-Suchmaschinen haben sich in den letzten Jahren gegenseitig darin übertroffen, möglichst leicht verständliche, lokalisierte und personalisierte Antworten auf kurze Suchanfragen anzubieten. Bei komplexen explorativen Rechercheaufgaben hingegen ist die größte Herausforderung für den Nutzer immer noch die gleiche wie in den 1990er Jahren: Eine einzige Suchanfrage so zu formulieren, dass alle notwendigen Schlüsselwörter enthalten sind, um zumindest ein paar relevante Ergebnisse zu erhalten.
In der vorliegenden Arbeit sollen neue Methoden entwickelt werden, um die explorative Suche zu erleichtern, indem Kontextinformationen aus dem gesamten Suchprozess des Nutzers einbezogen werden. Daher stellen wir das Konzept der sitzungsbasierten semantischen Suche vor, mit einer optionalen Erweiterung auf kollaborative Suchszenarien. Um die Relevanz von Suchergebnissen zu steigern, werden Suchanfragen mit Begriffen aus den letzten Anfragen des Nutzers angereichert, die im selben Suchkontext gestellt wurden (sitzungsbasierte Suche). Außerdem wird ein neuartiger Ansatz zur Klassifizierung von Suchanfragen eingeführt, der auf statistischen Themenmodellen basiert und es uns ermöglicht, die wichtigsten Themen in einer Suchsitzung zu erkennen, um damit weitere relevante Dokumente vorzuschlagen, die nicht durch Keyword-Matching gefunden werden konnten.
Um das Potential dieser Konzepte zu demonstrieren, wurde im Rahmen dieser Arbeit der Prototyp einer sitzungsbasierten semantischen Suchmaschine entwickelt, den wir als freie Software veröffentlichen. In einer qualitativen Nutzerstudie hat dieser Prototyp vielversprechende Ergebnisse hervorgebracht und wurde von den Teilnehmern positiv aufgenommen.
|
3 |
Word Embeddings in Database SystemsGünther, Michael 18 November 2021 (has links)
Research in natural language processing (NLP) focuses recently on the development of learned language models called word embedding models like word2vec, fastText, and BERT. Pre-trained on large amounts of unstructured text in natural language, those embedding models constitute a rich source of common knowledge in the domain of the text used for the training. In the NLP community, significant improvements are achieved by using those models together with deep neural network models. To support applications to benefit from word embeddings, we extend the capabilities of traditional relational database systems, which are still by far the most common DBMSs but only provide limited text analysis features. Therefore, we implement (a) novel database operations involving embedding representations to allow a database user to exploit the knowledge encoded in word embedding models for advanced text analysis operations. The integration of those operations into database query language enables users to construct queries using novel word embedding operations in conjunction with traditional query capabilities of SQL. To allow efficient retrieval of embedding representations and fast execution of the operations, we implement (b) novel search algorithms and index structures for approximated kNN-Joins and integrate those into a relational database management system. Moreover, we investigate techniques to optimize embedding representations of text values in database systems. Therefore, we design (c) a novel context adaptation algorithm. This algorithm utilizes the structured data present in the database to enrich the embedding representations of text values to model their context-specific semantic in the database. Besides, we provide (d) support for selecting a word embedding model suitable for a user's application. Therefore, we developed a data processing pipeline to construct a dataset for domain-specific word embedding evaluation. Finally, we propose (e) novel embedding techniques for pre-training on tabular data to support applications working with text values in tables. Our proposed embedding techniques model semantic relations arising from the alignment of words in tabular layouts that can only hardly be derived from text documents, e.g., relations between table schema and table body. In this way, many applications, which either employ embeddings in supervised machine learning models, e.g., to classify cells in spreadsheets, or through the application of arithmetic operations, e.g., table discovery applications, can profit from the proposed embedding techniques.:1 INTRODUCTION
1.1 Contribution
1.2 Outline
2 REPRESENTATION OF TEXT FOR NATURAL LANGUAGE PROCESSING
2.1 Natural Language Processing Systems
2.2 Word Embedding Models
2.2.1 Matrix Factorization Methods
2.2.2 Learned Distributed Representations
2.2.3 Contextualize Word Embeddings
2.2.4 Advantages of Contextualize and Static Word Embeddings
2.2.5 Properties of Static Word Embeddings
2.2.6 Node Embeddings
2.2.7 Non-Euclidean Embedding Techniques
2.3 Evaluation of Word Embeddings
2.3.1 Similarity Evaluation
2.3.2 Analogy Evaluation
2.3.3 Cluster-based Evaluation 2.4 Application for Tabular Data
2.4.1 Semantic Search
2.4.2 Data Curation
2.4.3 Data Discovery
3 SYSTEM OVERVIEW
3.1 Opportunities of an Integration
3.2 Characteristics of Word Vectors
3.3 Objectives and Challenges
3.4 Word Embedding Operations
3.5 Performance Optimization of Operations
3.6 Context Adaptation
3.7 Requirements for Model Recommendation
3.8 Tabular Embedding Models
4 MANAGEMENT OF EMBEDDING REPRESENTATIONS IN DATABASE SYSTEMS
4.1 Integration of Operations in an RDBMS
4.1.1 System Architecture
4.1.2 Storage Formats
4.1.3 User-Defined Functions
4.1.4 Web Application
4.2 Nearest Neighbor Search
4.2.1 Tree-based Methods
4.2.2 Proximity Graphs
4.2.3 Locality-Sensitive Hashing
4.2.4 Quantization Techniques
4.3 Applicability of ANN Techniques for Word Embedding kNN-Joins
4.4 Related Work on kNN Search in Database Systems
4.5 ANN-Joins for Relational Database Systems
4.5.1 Index Architecture
4.5.2 Search Algorithm
4.5.3 Distance Calculation
4.5.4 Optimization Capabilities
4.5.5 Estimation of the Number of Targets 4.5.6 Flexible Product Quantization
4.5.7 Further Optimizations
4.5.8 Parameter Tuning
4.5.9 kNN-Joins for Word2Bits
4.6 Evaluation
4.6.1 Experimental Setup
4.6.2 Influence of Index Parameters on Precision and Execution Time
4.6.3 Performance of Subroutines
4.6.4 Flexible Product Quantization
4.6.5 Accuracy of the Target Size Estimation
4.6.6 Performance of Word2Bits kNN-Join
4.7 Summary
5 CONTEXT ADAPTATION FOR WORD EMBEDDING OPTIMIZATION
5.1 Related Work
5.1.1 Graph and Text Joint Embedding Methods
5.1.2 Retrofitting Approaches
5.1.3 Table Embedding Models
5.2 Relational Retrofitting Approach
5.2.1 Data Preparation
5.2.2 Relational Retrofitting Problem
5.2.3 Relational Retrofitting Algorithm
5.2.4 Online-RETRO
5.3 Evaluation Platform: Retro Live
5.3.1 Functionality
5.3.2 Interface
5.4 Evaluation
5.4.1 Datasets
5.4.2 Training of Embeddings
5.4.3 Machine Learning Models
5.4.4 Evaluation of ML Models
5.4.5 Run-time Measurements
5.4.6 Online Retrofitting
5.5 Summary
6 MODEL RECOMMENDATION
6.1 Related Work
6.1.1 Extrinsic Evaluation
6.1.2 Intrinsic Evaluation
6.2 Architecture of FacetE
6.3 Evaluation Dataset Construction Pipeline
6.3.1 Web Table Filtering and Facet Candidate Generation
6.3.2 Check Soft Functional Dependencies
6.3.3 Post-Filtering
6.3.4 Categorization
6.4 Evaluation of Popular Word Embedding Models
6.4.1 Domain-Agnostic Evaluation
6.4.2 Evaluation of a Single Facet
6.4.3 Evaluation of an Object Set
6.5 Summary
7 TABULAR TEXT EMBEDDINGS
7.1 Related Work
7.1.1 Static Table Embedding Models
7.1.2 Contextualized Table Embedding Models
7.2 Web Table Embedding Model
7.2.1 Preprocessing
7.2.2 Text Serialization
7.2.3 Encoding Model
7.2.4 Embedding Training
7.3 Applications for Table Embeddings
7.3.1 Table Union Search
7.3.2 Classification Tasks
7.4 Evaluation
7.4.1 Intrinsic Evaluation
7.4.2 Table Union Search Evaluation
7.4.3 Table Layout Classification
7.4.4 Spreadsheet Cell Classification
7.5 Summary
8 CONCLUSION
8.1 Summary
8.2 Directions for Future Work
BIBLIOGRAPHY
LIST OF FIGURES
LIST OF TABLES
A CONVEXITY OF RELATIONAL RETROFITTING
B EVALUATION OF THE RELATIONAL RETROFITTING HYPERPARAMETERS
|
4 |
Towards Collaborative Session-based Semantic SearchStraub, Sebastian 11 October 2017 (has links)
In recent years, the most popular web search engines have excelled in their ability to answer short queries that require clear, localized and personalized answers. When it comes to complex exploratory search tasks however, the main challenge for the searcher remains the same as back in the 1990s: Trying to formulate a single query that contains all the right keywords to produce at least some relevant results.
In this work we want to investigate new ways to facilitate exploratory search by making use of context information from the user's entire search process. Therefore we present the concept of session-based semantic search, with an optional extension to collaborative search scenarios. To improve the relevance of search results we expand queries with terms from the user's recent query history in the same search context (session-based search). We introduce a novel method for query classification based on statistical topic models which allows us to track the most important topics in a search session so that we can suggest relevant documents that could not be found through keyword matching.
To demonstrate the potential of these concepts, we have built the prototype of a session-based semantic search engine which we release as free and open source software. In a qualitative user study that we have conducted, this prototype has shown promising results and was well-received by the participants.:1. Introduction
2. Related Work
2.1. Topic Models
2.1.1. Common Traits
2.1.2. Topic Modeling Techniques
2.1.3. Topic Labeling
2.1.4. Topic Graph Visualization
2.2. Session-based Search
2.3. Query Classification
2.4. Collaborative Search
2.4.1. Aspects of Collaborative Search Systems
2.4.2. Collaborative Information Retrieval Systems
3. Core Concepts
3.1. Session-based Search
3.1.1. Session Data
3.1.2. Query Aggregation
3.2. Topic Centroid
3.2.1. Topic Identification
3.2.2. Topic Shift
3.2.3. Relevance Feedback
3.2.4. Topic Graph Visualization
3.3. Search Strategy
3.3.1. Prerequisites
3.3.2. Search Algorithms
3.3.3. Query Pipeline
3.4. Collaborative Search
3.4.1. Shared Topic Centroid
3.4.2. Group Management
3.4.3. Collaboration
3.5. Discussion
4. Prototype
4.1. Document Collection
4.1.1. Selection Criteria
4.1.2. Data Preparation
4.1.3. Search Index
4.2. Search Engine
4.2.1. Search Algorithms
4.2.2. Query Pipeline
4.2.3. Session Persistence
4.3. User Interface
4.4. Performance Review
4.5. Discussion
5. User Study
5.1. Methods
5.1.1. Procedure
5.1.2. Implementation
5.1.3. Tasks
5.1.4. Questionnaires
5.2. Results
5.2.1. Participants
5.2.2. Task Review
5.2.3. Literature Research Results
5.3. Discussion
6. Conclusion
Bibliography
Weblinks
A. Appendix
A.1. Prototype: Source Code
A.2. Survey
A.2.1. Tasks
A.2.2. Document Filter for Google Scholar
A.2.3. Questionnaires
A.2.4. Participant’s Answers
A.2.5. Participant’s Search Results / Die führenden Web-Suchmaschinen haben sich in den letzten Jahren gegenseitig darin übertroffen, möglichst leicht verständliche, lokalisierte und personalisierte Antworten auf kurze Suchanfragen anzubieten. Bei komplexen explorativen Rechercheaufgaben hingegen ist die größte Herausforderung für den Nutzer immer noch die gleiche wie in den 1990er Jahren: Eine einzige Suchanfrage so zu formulieren, dass alle notwendigen Schlüsselwörter enthalten sind, um zumindest ein paar relevante Ergebnisse zu erhalten.
In der vorliegenden Arbeit sollen neue Methoden entwickelt werden, um die explorative Suche zu erleichtern, indem Kontextinformationen aus dem gesamten Suchprozess des Nutzers einbezogen werden. Daher stellen wir das Konzept der sitzungsbasierten semantischen Suche vor, mit einer optionalen Erweiterung auf kollaborative Suchszenarien. Um die Relevanz von Suchergebnissen zu steigern, werden Suchanfragen mit Begriffen aus den letzten Anfragen des Nutzers angereichert, die im selben Suchkontext gestellt wurden (sitzungsbasierte Suche). Außerdem wird ein neuartiger Ansatz zur Klassifizierung von Suchanfragen eingeführt, der auf statistischen Themenmodellen basiert und es uns ermöglicht, die wichtigsten Themen in einer Suchsitzung zu erkennen, um damit weitere relevante Dokumente vorzuschlagen, die nicht durch Keyword-Matching gefunden werden konnten.
Um das Potential dieser Konzepte zu demonstrieren, wurde im Rahmen dieser Arbeit der Prototyp einer sitzungsbasierten semantischen Suchmaschine entwickelt, den wir als freie Software veröffentlichen. In einer qualitativen Nutzerstudie hat dieser Prototyp vielversprechende Ergebnisse hervorgebracht und wurde von den Teilnehmern positiv aufgenommen.:1. Introduction
2. Related Work
2.1. Topic Models
2.1.1. Common Traits
2.1.2. Topic Modeling Techniques
2.1.3. Topic Labeling
2.1.4. Topic Graph Visualization
2.2. Session-based Search
2.3. Query Classification
2.4. Collaborative Search
2.4.1. Aspects of Collaborative Search Systems
2.4.2. Collaborative Information Retrieval Systems
3. Core Concepts
3.1. Session-based Search
3.1.1. Session Data
3.1.2. Query Aggregation
3.2. Topic Centroid
3.2.1. Topic Identification
3.2.2. Topic Shift
3.2.3. Relevance Feedback
3.2.4. Topic Graph Visualization
3.3. Search Strategy
3.3.1. Prerequisites
3.3.2. Search Algorithms
3.3.3. Query Pipeline
3.4. Collaborative Search
3.4.1. Shared Topic Centroid
3.4.2. Group Management
3.4.3. Collaboration
3.5. Discussion
4. Prototype
4.1. Document Collection
4.1.1. Selection Criteria
4.1.2. Data Preparation
4.1.3. Search Index
4.2. Search Engine
4.2.1. Search Algorithms
4.2.2. Query Pipeline
4.2.3. Session Persistence
4.3. User Interface
4.4. Performance Review
4.5. Discussion
5. User Study
5.1. Methods
5.1.1. Procedure
5.1.2. Implementation
5.1.3. Tasks
5.1.4. Questionnaires
5.2. Results
5.2.1. Participants
5.2.2. Task Review
5.2.3. Literature Research Results
5.3. Discussion
6. Conclusion
Bibliography
Weblinks
A. Appendix
A.1. Prototype: Source Code
A.2. Survey
A.2.1. Tasks
A.2.2. Document Filter for Google Scholar
A.2.3. Questionnaires
A.2.4. Participant’s Answers
A.2.5. Participant’s Search Results
|
5 |
GoPubMed: Ontology-based literature search for the life sciences / GoPubMed: ontologie-basierte Literatursuche für die LebenswissenschaftenDoms, Andreas 20 January 2009 (has links) (PDF)
Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.
|
6 |
GoWeb: Semantic Search and Browsing for the Life SciencesDietze, Heiko 21 December 2010 (has links) (PDF)
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason.
This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%.
Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services.
To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering.
Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system.
|
7 |
GoWeb: Semantic Search and Browsing for the Life SciencesDietze, Heiko 20 October 2010 (has links)
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason.
This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventional search results are annotated using a customized annotation algorithm, which takes the textual properties and requirements such as speed and scalability into account. The biomedical background knowledge consists of the GeneOntology and Medical Subject Headings and other related entities, e.g. proteins/gene names and person names. Together they provide the relevant semantic context for a search engine for the life sciences. We develop the system GoWeb for semantic web search and evaluate it using three benchmarks. It is shown that GoWeb is able to aid question answering with success rates up to 79%.
Furthermore, the system also includes semantic hyperlinks that enable semantic browsing of the knowledge space. The semantic hyperlinks facilitate the use of the eScience infrastructure, even complex workflows of composed web services.
To complement the web search of GoWeb, other data source and more specialized information needs are tested in different prototypes. This includes patents and intranet search. Semantic search is applicable for these usage scenarios, but the developed systems also show limits of the semantic approach. That is the size, applicability and completeness of the integrated ontologies, as well as technical issues of text-extraction and meta-data information gathering.
Additionally, semantic indexing as an alternative approach to implement semantic search is implemented and evaluated with a question answering benchmark. A semantic index can help to answer questions and address some limitations of GoWeb. Still the maintenance and optimization of such an index is a challenge, whereas GoWeb provides a straightforward system.
|
8 |
GoPubMed: Ontology-based literature search for the life sciencesDoms, Andreas 06 January 2009 (has links)
Background: Most of our biomedical knowledge is only accessible through texts. The biomedical literature grows exponentially and PubMed comprises over 18.000.000 literature abstracts. Recently much effort has been put into the creation of biomedical ontologies which capture biomedical facts. The exploitation of ontologies to explore the scientific literature is a new area of research. Motivation: When people search, they have questions in mind. Answering questions in a domain requires the knowledge of the terminology of that domain. Classical search engines do not provide background knowledge for the presentation of search results. Ontology annotated structured databases allow for data-mining. The hypothesis is that ontology annotated literature databases allow for text-mining. The central problem is to associate scientific publications with ontological concepts. This is a prerequisite for ontology-based literature search. The question then is how to answer biomedical questions using ontologies and a literature corpus. Finally the task is to automate bibliometric analyses on an corpus of scientific publications. Approach: Recent joint efforts on automatically extracting information from free text showed that the applied methods are complementary. The idea is to employ the rich terminological and relational information stored in biomedical ontologies to markup biomedical text documents. Based on established semantic links between documents and ontology concepts the goal is to answer biomedical question on a corpus of documents. The entirely annotated literature corpus allows for the first time to automatically generate bibliometric analyses for ontological concepts, authors and institutions. Results: This work includes a novel annotation framework for free texts with ontological concepts. The framework allows to generate recognition patterns rules from the terminological and relational information in an ontology. Maximum entropy models can be trained to distinguish the meaning of ambiguous concept labels. The framework was used to develop a annotation pipeline for PubMed abstracts with 27,863 Gene Ontology concepts. The evaluation of the recognition performance yielded a precision of 79.9% and a recall of 72.7% improving the previously used algorithm by 25,7% f-measure. The evaluation was done on a manually created (by the original authors) curation corpus of 689 PubMed abstracts with 18,356 curations of concepts. Methods to reason over large amounts of documents with ontologies were developed. The ability to answer questions with the online system was shown on a set of biomedical question of the TREC Genomics Track 2006 benchmark. This work includes the first ontology-based, large scale, online available, up-to-date bibliometric analysis for topics in molecular biology represented by GO concepts. The automatic bibliometric analysis is in line with existing, but often out-dated, manual analyses. Outlook: A number of promising continuations starting from this work have been spun off. A freely available online search engine has a growing user community. A spin-off company was funded by the High-Tech Gründerfonds which commercializes the new ontology-based search paradigm. Several off-springs of GoPubMed including GoWeb (general web search), Go3R (search in replacement, reduction, refinement methods for animal experiments), GoGene (search in gene/protein databases) are developed.
|
Page generated in 0.0573 seconds