<p> Acquiring vast bodies of knowledge in machine-understandable form is one of the main challenges in artificial intelligence. Information Extraction is the task of automatically extracting structured, machine-understandable information from unstructured or semi-structured data. Recent advances in information extraction and the massive scale of data on the Web present a unique opportunity for artificial intelligence systems for large-scale automatic knowledge acquisition. However, to realize the full potential of the automatically extracted information, it is essential to understand their semantics. </p><p> A key step in understanding the semantics of extracted information is entity linking: the task of mapping a phrase in text to its referent entity in a given knowledge base. In addition to identifying entities mentioned in text, an AI system can benefit significantly from the organization of entities in a taxonomy. While taxonomies are used in a variety of applications, including IBM’s Jeopardy-winning Watson system, they demand significant effort in their creation. They are either manually curated, or built using semi-supervised machine learning techniques.</p><p> This dissertation explores methods to automatically infer a taxonomy of entities, given the properties that are usually associated with them (e.g. as a City, Chicago is usually associated with properties like "population" and "area"). Our approach is based on the <i>Property Inheritance hypothesis, </i> which states that entities of a specific type in a taxonomy inherit properties from more general types. We apply this hypothesis to two distinct information extraction tasks — each of which is aimed at understanding the semantics of information mined from the Web. First, we describe the two systems (1) TABEL: a state-of-the art system that performs the task of entity linking on Web tables, and (2) SKEY: a system that extracts key phrases that summarize a document in a given corpus. We then apply topic models that encode our hypothesis in a probabilistic framework to automatically infer a taxonomy in each task.</p>
Identifer | oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:10117145 |
Date | 09 June 2016 |
Creators | Bhagavatula, Chandra Sekhar |
Publisher | Northwestern University |
Source Sets | ProQuest.com |
Language | English |
Detected Language | English |
Type | thesis |
Page generated in 0.002 seconds