Return to search

Using a rewriting system to model individual writing styles

Each individual has a distinguished writing style. But natural language generation systems pro- duce text with much less variety. Is it possible to produce more human-like text from natural language generation systems by mimicking the style of particular authors? We start by analysing the text of real authors. We collect a corpus of texts from a single genre (food recipes) with each text identified with its author, and summarise a variety of writing features in these texts. Each author's writing style is the combination of a set of features. Analysis of the writing features shows that not only does each individual author write differently but the differences are consistent over the whole of their corpus. Hence we conclude that authors do keep consistent style consisting of a variety of different features. When we discuss notions such as the style and meaning of texts, we are referring to the reac- tion that readers have to them. It is important, therefore, in the field of computational linguistics to experiment by showing texts to people and assessing their interpretation of the texts. In our research we move the thesis from simple discussion and statistical analysis of the properties of text and NLG systems, to perform experiments to verify the actual impact that lexical preference has on real readers. Through experiments that require participants to follow a recipe and prepare food, we conclude that it is possible to alter the lexicon of a recipe without altering the actions performed by the cook, hence that word choice is an aspect of style rather than semantics; and also that word choice is one of the writing features employed by readers in identifying the author of a text. Among all writing features, individual lexical preference is very important both for analysing and generating texts. So we choose individual lexical choice as our principal topic of research. Using a modified version of distributional similarity CDS) helps us to choose words used by in- dividual authors without the limitation of many other solutions such as a pre-built thesauri. We present an algorithm for analysis and rewriting, and assess the results. Based on the results we propose some further improvements.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:558669
Date January 2012
CreatorsLin, Jing
PublisherUniversity of Aberdeen
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=186641

Page generated in 0.0025 seconds