When computers generate text, they have to consider how to describe the entities mentioned in the text. This situation becomes more difficult when the audience is unknown, as it is not clear what information is available to the addressees. This thesis investigates generation of descriptions in situations when an algorithm does not have a precise model of addressee's knowledge. This thesis starts with the collection and analysis of a corpus of descriptions of famous people. The analysis of the corpus revealed a number of useful patterns, which informed the remainder of this thesis. One of the difficult questions is how to choose information that helps addressees identify the described person. This thesis introduces a corpus-based method for determining which properties are more likely to be known by the addressees, and a probability-based method to identify properties that are distinguishing. One of the patterns observed in the collected corpus is the inclusion of multiple properties each of which uniquely identifies the referent. This thesis introduces a novel corpus-based method for determining how many properties to include in a description. Finally, a number of algorithms that leverage the findings of the corpus analysis and their computational implementation are proposed and tested in an evaluation involving human participants. The proposed algorithms outperformed the Incremental Algorithm in terms of numbers of correctly identified referents and in terms of providing a better mental image of the referent. The main contributions of this thesis are: (1) a corpus-based analysis of descriptions produced for an unknown audience; (2) a computational heuristic for estimating what information is likely to be known to addressees; and (3) algorithms that can generate referring expressions that benefit addressees without having an explicit model of addressee's knowledge.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:633277 |
Date | January 2014 |
Creators | Kutlák, Roman |
Publisher | University of Aberdeen |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=215217 |
Page generated in 0.0021 seconds