1 |
Generating references in hierarchical domains : the case of document deixisParaboni, IvandreÌ January 2003 (has links)
No description available.
|
2 |
The production of prosodic focus and contour in dialogueYoud, Nicholas John January 1992 (has links)
Computer programs designed to converse with humans in natural language provide a framework against which to test supra-sentential theories of language production and interpretation. This thesis seeks to flesh out, in terms of a computer model, two basic assumptions concerning prosody-that speakers use intonation to convey intention, or attitude, and that prosodic prominence serves to convey conceptual prommence. A model of an information-providing agent in is proposed, based on an analysis of a corpus of spontaneous dialogues. This uses an architecture of communicating processes, which perform interpretation, application-specific planning, repair, and the production of output. Dialogue acts are then defined as feature bundles corresponding to significant events. A corpus of read dialogues is analysed in terms of these features, and using conventional intonational labelling. Correlations between the two are examined. Prosodic prominence is examined at three levels. At the level of surface encoding, re-use of substrings and structural parallelism can reduce processing for the speaker, and the listener. At the level of conceptual planning, similar benefits exist, given that speakers and listeners assume a common discourse model wherever possible. At these levels use is made of a short-term buffer of recent forms. A speaker may additionally use contrastive prominence to draw the listener's attention to disparities. Finally, at the level of intentions, a speaker wish to highlight certain information, regardless of accessibility. Prosodic focus is represented relationally, rather than via a simple binary-valued feature. This has the advantage of facilitating the mapping between levels; it also renders straightforward the notion of focus as the product of a number of potentially conflicting influences. Those parts of the theory concerned with discourse representation, language generation, and prosodic focus have been implemented as part of the Sundial dialogue system. In this system, discoursal and pragmatic decisions affecting prosody are converted to annotations on a text string, for realisation by a rule-based synthesizer.
|
3 |
The role of document structure in text generationBouayad-Agha, Nadjet January 2001 (has links)
No description available.
|
4 |
Grammars for generating isiXhosa and isiZulu weather bulletin verbsMahlaza, Zola January 2018 (has links)
The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects.
|
5 |
Joint models for concept-to-text generationKonstas, Ioannis January 2014 (has links)
Much of the data found on the world wide web is in numeric, tabular, or other nontextual format (e.g., weather forecast tables, stock market charts, live sensor feeds), and thus inaccessible to non-experts or laypersons. However, most conventional search engines and natural language processing tools (e.g., summarisers) can only handle textual input. As a result, data in non-textual form remains largely inaccessible. Concept-to-text generation refers to the task of automatically producing textual output from non-linguistic input, and holds promise for rendering non-linguistic data widely accessible. Several successful generation systems have been produced in the past twenty years. They mostly rely on human-crafted rules or expert-driven grammars, implement a pipeline architecture, and usually operate in a single domain. In this thesis, we present several novel statistical models that take as input a set of database records and generate a description of them in natural language text. Our unique idea is to combine the processes of structuring a document (document planning), deciding what to say (content selection) and choosing the specific words and syntactic constructs specifying how to say it (lexicalisation and surface realisation), in a uniform joint manner. Rather than breaking up the generation process into a sequence of local decisions, we define a probabilistic context-free grammar that globally describes the inherent structure of the input (a corpus of database records and text describing some of them). This joint representation allows individual processes (i.e., document planning, content selection, and surface realisation) to communicate and influence each other naturally. We recast generation as the task of finding the best derivation tree for a set of input database records and our grammar, and describe several algorithms for decoding in this framework that allows to intersect the grammar with additional information capturing fluency and syntactic well-formedness constraints. We implement our generators using the hypergraph framework. Contrary to traditional systems, we learn all the necessary document, structural and linguistic knowledge from unannotated data. Additionally, we explore a discriminative reranking approach on the hypergraph representation of our model, by including more refined content selection features. Central to our approach is the idea of porting our models to various domains; we experimented on four widely different domains, namely sportscasting, weather forecast generation, booking flights, and troubleshooting guides. The performance of our systems is competitive and often superior compared to state-of-the-art systems that use domain specific constraints, explicit feature engineering or labelled data.
|
6 |
Learning to tell tales : automatic story generation from corporaMcIntyre, Neil Duncan January 2011 (has links)
Automatic story generation has a long-standing tradition in the field of Artificial Intelligence. The ability to create stories on demand holds great potential for entertainment and education. For example, modern computer games are becoming more immersive, containing multiple story lines and hundreds of characters. This has substantially increased the amount of work required to produce each game. However, by allowing the game to write its own story line, it can remain engaging to the player whilst shifting the burden of writing away from the game’s developers. In education, intelligent tutoring systems can potentially provide students with instant feedback and suggestions of how to write their own stories. Although several approaches have been introduced in the past (e.g., story grammars, story schema and autonomous agents), they all rely heavily on handwritten resources. Which places severe limitations on its scalability and usage. In this thesis we will motivate a new approach to story generation which takes its inspiration from recent research in Natural Language Generation. Whose result is an interactive data-driven system for the generation of children’s stories. One of the key features of this system is that it is end-to-end, realising the various components of the generation pipeline stochastically. Knowledge relating to the generation and planning of stories is leveraged automatically from corpora and reformulated into new stories to be presented to the user. We will also show that story generation can be viewed as a search task, operating over a large number of stories that can be generated from knowledge inherent in a corpus. Using trainable scoring functions, our system can search the story space using different document level criteria. In this thesis we focus on two of these, namely, coherence and interest. We will also present two major paradigms for generation through search, (a) generate and rank, and (b) genetic algorithms. We show the effects on perceived story interest, fluency and coherence that result from these approaches. In addition, we show how the explicit use of plots induced from the corpus can be used to guide the generation process, providing a heuristically motivated starting point for story search. We motivate extensions to the system and show that additional modules can be used to improve the quality of the generated stories and overall scalability. Finally we highlight the current strengths and limitations of our approach and discuss possible future approaches to this field of research.
|
7 |
Linguistically Motivated Features for CCG Realization RankingRajkumar, Rajakrishnan P. 19 July 2012 (has links)
No description available.
|
8 |
Personality and alignment processes in dialogue : towards a lexically-based unified modelBrockmann, Carsten January 2009 (has links)
This thesis explores approaches to modelling individual differences in language use. The differences under consideration fall into two broad categories: Variation of the personality projected through language, and modelling of language alignment behaviour between dialogue partners. In a way, these two aspects oppose each other – language related to varying personalities should be recognisably different, while aligning speakers agree on common language during a dialogue. The central hypothesis is that such variation can be captured and produced with restricted computational means. Results from research on personality psychology and psycholinguistics are transformed into a series of lexically-based Affective Language Production Models (ALPMs) which are parameterisable for personality and alignment. The models are then explored by varying the parameters and observing the language they generate. ALPM-1 and ALPM-2 re-generate dialogues from existing utterances which are ranked and filtered according to manually selected linguistic and psycholinguistic features that were found to be related to personality. ALPM-3 is based on true overgeneration of paraphrases from semantic representations using the OPENCCG framework for Combinatory Categorial Grammar (CCG), in combination with corpus-based ranking and filtering by way of n-gram language models. Personality effects are achieved through language models built from the language of speakers of known personality. In ALPM-4, alignment is captured via a cache language model that remembers the previous utterance and thus influences the choice of the next. This model provides a unified treatment of personality and alignment processes in dialogue. In order to evaluate the ALPMs, dialogues between computer characters were generated and presented to human judges who were asked to assess the characters’ personality. In further internal simulations, cache language models were used to reproduce results of psycholinguistic priming studies. The experiments showed that the models are capable of producing natural language dialogue which exhibits human-like personality and alignment effects.
|
9 |
Natural language generation as neural sequence learning and beyondZhang, Xingxing January 2017 (has links)
Natural Language Generation (NLG) is the task of generating natural language (e.g., English sentences) from machine readable input. In the past few years, deep neural networks have received great attention from the natural language processing community due to impressive performance across different tasks. This thesis addresses NLG problems with deep neural networks from two different modeling views. Under the first view, natural language sentences are modelled as sequences of words, which greatly simplifies their representation and allows us to apply classic sequence modelling neural networks (i.e., recurrent neural networks) to various NLG tasks. Under the second view, natural language sentences are modelled as dependency trees, which are more expressive and allow to capture linguistic generalisations leading to neural models which operate on tree structures. Specifically, this thesis develops several novel neural models for natural language generation. Contrary to many existing models which aim to generate a single sentence, we propose a novel hierarchical recurrent neural network architecture to represent and generate multiple sentences. Beyond the hierarchical recurrent structure, we also propose a means to model context dynamically during generation. We apply this model to the task of Chinese poetry generation and show that it outperforms competitive poetry generation systems. Neural based natural language generation models usually work well when there is a lot of training data. When the training data is not sufficient, prior knowledge for the task at hand becomes very important. To this end, we propose a deep reinforcement learning framework to inject prior knowledge into neural based NLG models and apply it to sentence simplification. Experimental results show promising performance using our reinforcement learning framework. Both poetry generation and sentence simplification are tackled with models following the sequence learning view, where sentences are treated as word sequences. In this thesis, we also explore how to generate natural language sentences as tree structures. We propose a neural model, which combines the advantages of syntactic structure and recurrent neural networks. More concretely, our model defines the probability of a sentence by estimating the generation probability of its dependency tree. At each time step, a node is generated based on the representation of the generated subtree. We show experimentally that this model achieves good performance in language modeling and can also generate dependency trees.
|
10 |
Geração de expressões de referência em situações de comunicação com restrição de tempo / Referring Expression Generation in time-constrained situations of communicationMariotti, Andre Costa 13 September 2017 (has links)
Este documento apresenta uma pesquisa a nvel de mestrado acadêmico, cujo o foco é a tarefa computacional de Geração de Expressões de Referência (GER), uma parte fundamental da comunicação que é estudada na Geração de Linguagem Natural (GLN). Mais especificamente, foram estudados os aspectos da linguagem que se manifestam em contextos de comunicação com restrição de tempo, e com base nisso foi proposto um modelo computacional de GER para produzir expressões de referência com o nvel de superespecificação parametrizável. Além disso, considerando-se as caractersticas de adaptabilidade do modelo proposto, também foi sugerida uma generalização deste para outros domnios, como os que compreendem contextos de comunicação além dos que possuem restrição de tempo / This document presents a MSc research that focused on the computational subtask of Referring Expression Generation (REG), an important component of Natural Language Generation (NLG) systems. More specifically, this work analyzes how time-restricted contexts of communication may affect language production and a computational model of GER was proposed to produce reference expressions with parameterizable superspecification. Furthermore, given the adaptability characteristics of the proposed model, it has also been suggested a generalization to other domains, which includes communication contexts besides those that have time constraints
|
Page generated in 0.1201 seconds