Return to search

Generation of referring expressions for an unknown audience

When computers generate text, they have to consider how to describe the entities mentioned in the text. This situation becomes more difficult when the audience is unknown, as it is not clear what information is available to the addressees. This thesis investigates generation of descriptions in situations when an algorithm does not have a precise model of addressee's knowledge. This thesis starts with the collection and analysis of a corpus of descriptions of famous people. The analysis of the corpus revealed a number of useful patterns, which informed the remainder of this thesis. One of the difficult questions is how to choose information that helps addressees identify the described person. This thesis introduces a corpus-based method for determining which properties are more likely to be known by the addressees, and a probability-based method to identify properties that are distinguishing. One of the patterns observed in the collected corpus is the inclusion of multiple properties each of which uniquely identifies the referent. This thesis introduces a novel corpus-based method for determining how many properties to include in a description. Finally, a number of algorithms that leverage the findings of the corpus analysis and their computational implementation are proposed and tested in an evaluation involving human participants. The proposed algorithms outperformed the Incremental Algorithm in terms of numbers of correctly identified referents and in terms of providing a better mental image of the referent. The main contributions of this thesis are: (1) a corpus-based analysis of descriptions produced for an unknown audience; (2) a computational heuristic for estimating what information is likely to be known to addressees; and (3) algorithms that can generate referring expressions that benefit addressees without having an explicit model of addressee's knowledge.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:633277
Date January 2014
CreatorsKutlák, Roman
PublisherUniversity of Aberdeen
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=215217

Page generated in 0.0021 seconds