Ontology learning from text generally consists roughly of NLP, knowledge extraction and ontology construction. While NLP and information extraction for Swedish is approaching that of English, these methods have not been assembled into the full ontology learning pipeline. This means that there is currently very little automated support for using knowledge from Swedish literature in semantically-enabled systems. This thesis demonstrates the feasibility of using some existing OL methods for Swedish text and elicits proposals for further work toward building and studying open domain ontology learning systems for Swedish and perhaps multiple languages. This is done by building a prototype ontology learning system based on the state of the art architecture of such systems, using the Korp NLP framework for Swedish text, the GATE system for corpus and annotation management, and embedding it as a self-contained plugin to the Protege ontology engineering framework. The prototype is evaluated similarly to other OL systems. As expected, it is found that while sufficient for demonstrating feasibility, the ontology produced in the evaluation is not usable in practice, since many more methods and fewer cascading errors are necessary to richly and accurately model the domain. In addition to simply implementing more methods to extract more ontology elements, a framework for programmatically defining knowledge extraction and ontology construction methods and their dependencies is recommended to enable more effective research and application of ontology learning.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-245334 |
Date | January 2015 |
Creators | Bothma, Bothma |
Publisher | Uppsala universitet, Institutionen för informationsteknologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | IT ; 15006 |
Page generated in 0.0022 seconds