Global ETD Search

1	Word-sense disambiguation in biomedical ontologies Alexopoulou, Dimitra 12 January 2011 (has links) (PDF) With the ever increase in biomedical literature, text-mining has emerged as an important technology to support bio-curation and search. Word sense disambiguation (WSD), the correct identification of terms in text in the light of ambiguity, is an important problem in text-mining. Since the late 1940s many approaches based on supervised (decision trees, naive Bayes, neural networks, support vector machines) and unsupervised machine learning (context-clustering, word-clustering, co-occurrence graphs) have been developed. Knowledge-based methods that make use of the WordNet computational lexicon have also been developed. But only few make use of ontologies, i.e. hierarchical controlled vocabularies, to solve the problem and none exploit inference over ontologies and the use of metadata from publications. This thesis addresses the WSD problem in biomedical ontologies by suggesting diﬀerent approaches for word sense disambiguation that use ontologies and metadata. The "Closest Sense" method assumes that the ontology deﬁnes multiple senses of the term; it computes the shortest path of co-occurring terms in the document to one of these senses. The "Term Cooc" method deﬁnes a log-odds ratio for co-occurring terms including inferred co-occurrences. The "MetaData" approach trains a classiﬁer on metadata; it does not require any ontology, but requires training data, which the other methods do not. These approaches are compared to each other when applied to a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The MetaData approach performs best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The Term Cooc approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The Closest Sense approach achieves on average 80% success rate. Furthermore, the thesis showcases applications ranging from ontology design to semantic search where WSD is important. word sense disambiguation WSD ontologies information retrieval Gene Ontology GO Medical Subject Headings MeSH Wortbedeutung Disambiguierung WSD Ontologie Information Retrieval Gene Ontologie GO Medical Subject Headings MeSH ddc:004 rvk:ST 640
2	Word-sense disambiguation in biomedical ontologies Alexopoulou, Dimitra 11 June 2010 (has links) With the ever increase in biomedical literature, text-mining has emerged as an important technology to support bio-curation and search. Word sense disambiguation (WSD), the correct identification of terms in text in the light of ambiguity, is an important problem in text-mining. Since the late 1940s many approaches based on supervised (decision trees, naive Bayes, neural networks, support vector machines) and unsupervised machine learning (context-clustering, word-clustering, co-occurrence graphs) have been developed. Knowledge-based methods that make use of the WordNet computational lexicon have also been developed. But only few make use of ontologies, i.e. hierarchical controlled vocabularies, to solve the problem and none exploit inference over ontologies and the use of metadata from publications. This thesis addresses the WSD problem in biomedical ontologies by suggesting diﬀerent approaches for word sense disambiguation that use ontologies and metadata. The "Closest Sense" method assumes that the ontology deﬁnes multiple senses of the term; it computes the shortest path of co-occurring terms in the document to one of these senses. The "Term Cooc" method deﬁnes a log-odds ratio for co-occurring terms including inferred co-occurrences. The "MetaData" approach trains a classiﬁer on metadata; it does not require any ontology, but requires training data, which the other methods do not. These approaches are compared to each other when applied to a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The MetaData approach performs best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The Term Cooc approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The Closest Sense approach achieves on average 80% success rate. Furthermore, the thesis showcases applications ranging from ontology design to semantic search where WSD is important. info:eu-repo/classification/ddc/004 ddc:004

Search results

Word-sense disambiguation in biomedical ontologies

Word-sense disambiguation in biomedical ontologies