This thesis details the creation of ASKNet (Automated Semantic Knowledge Network), a system for creating large scale semantic networks from natural language texts. Using ASKNet as an example, we will show that by using existing natural language processing (NLP) tools, combined with a novel use of spreading activation theory, it is possible to efficiently create high quality semantic networks on a scale never before achievable. The ASKNet system takes naturally occurring English text (e.g., newspaper articles), and processes them using existing NLP tools. It then uses the output of those tools to create semantic network fragments representing the meaning of each sentence in the text. Those fragments are then combined by a spreading activation based algorithm that attempts to decide which portions of the networks refer to the same real-world entity. This allows ASKNet to combine the small fragments together into a single cohesive resource, which has more expressive power than the sum of its parts. Systems aiming to build semantic resources have typically either overlooked information integration completely, or else dismissed it as being AI-complete, and thus unachievable. In this thesis we will show that information integration is both an integral component of any semantic resource, and achievable through a combination of NLP technologies and novel applications of spreading activation theory. While extraction and integration of all knowledge within a text may be AI-complete, we will show that by processing large quantities of text efficiently, we can compensate for minor processing errors and missed relations with volume and creation speed. If relations are too difficult to extract, or we are unsure which nodes should integrate at any given stage, we can simply leave them to be picked up later when we have more information or come across a document which explains the concept more clearly. ASKNet is primarily designed as a proof of concept system. However, this thesis will show that it is capable of creating semantic networks larger than any existing similar resource in a matter of days, and furthermore that the networks it creates of are sufficient quality to be used for real world tasks. We will demonstrate that ASKNet can be used to judge semantic relatedness of words, achieving results comparable to the best state-of-the-art systems.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:504371 |
Date | January 2009 |
Creators | Harrington, Brian |
Contributors | Clark, Stephen |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:1c7154d3-f7d1-493e-b521-4e5ceb540038 |
Page generated in 0.0021 seconds