Global ETD Search

1	Class-based statistical models for lexical knowledge acquisition Clark, Stephen January 2001 (has links) This thesis is about the automatic acquisition of a particular kind of lexical knowledge, namely the knowledge of which noun senses can fill the argument slots of predicates. The knowledge is represented using probabilities, which agrees with the intuition that there are no absolute constraints on the arguments of predicates, but that the constraints are satisfied to a certain degree; thus the problem of knowledge acquisition becomes the problem of probability estimation from corpus data. The problem with defining a probability model in terms of senses is that this involves a huge number of parameters, which results in a sparse data problem. The proposal here is to define a probability model over senses in a semantic hierarchy, and exploit the fact that senses can be grouped into classes consisting of semantically similar senses. A novel class-based estimation technique is developed, together with a procedure that determines a suitable class for a sense (given a predicate and argument position). The problem of determining a suitable class can be thought of as finding a suitable level of generalisation in the hierarchy. The generalisation procedure uses a statistical test to locate areas consisting of semantically similar senses, and, as well as being used for probability estimation, is also employed as part of a re-estimation algorithm for estimating sense frequencies from incomplete data. The rest of the thesis considers how the lexical knowledge can be used to resolve structural ambiguities, and provides empirical evaluations. The estimation techniques are first integrated into a parse selection system, using a probabilistic dependency model to rank the alternative parses for a sentence. Then, a PP-attachment task is used to provide an evaluation which is more focussed on the class-based estimation technique, and, finally, a pseudo disambiguation task is used to compare the estimation technique with alternative approaches. 410 Computational linguistics
2	Knowledge representation in natural language : the wordicle - a subconscious connection Downey, Daniel J. G. January 1991 (has links) No description available. 003.5 Computational linguistics
3	Computing presuppositions in an incremental natural language processing system Bridge, Derek G. January 1991 (has links) No description available. 410 Computational linguistics
4	Learning unification-based natural language grammars Osborne, Miles January 1994 (has links) No description available. 005 Computational linguistics
5	The representation of natural language to enable neural networks to detect syntactic features Lyon, Caroline January 1994 (has links) No description available. 621.3994 Computational linguistics
6	Measuring text reuse Clough, Paul D. January 2002 (has links) No description available. 410 Computational linguistics
7	Automatic generation of spatial configurations in user interfaces Fischer, Markus January 1998 (has links) No description available. 005 Computational linguistics
8	New models of natural language for consultative computing Gwei, G. M. January 1987 (has links) No description available. 621.3822 Computational linguistics
9	Natural language generation in the LOLITA system an engineering approach Smith, Mark H. January 1995 (has links) Natural Language Generation (NLG) is the automatic generation of Natural Language (NL) by computer in order to meet communicative goals. One aim of NL processing (NLP) is to allow more natural communication with a computer and, since communication is a two-way process, a NL system should be able to produce as well as interpret NL text. This research concerns the design and implementation of a NLG module for the LOLITA system. LOLITA (Large scale, Object-based, Linguistic Interactor, Translator and Analyser) is a general purpose base NLP system which performs core NLP tasks and upon which prototype NL applications have been built. As part of this encompassing project, this research shares some of its properties and methodological assumptions: the LOLITA generator has been built following Natural Language Engineering principles uses LOLITA's SemNet representation as input and is implemented in the functional programming language Haskell. As in other generation systems the adopted solution utilises a two component architecture. However, in order to avoid problems which occur at the interface between traditional planning and realisation modules (known as the generation gap) the distribution of tasks between the planner and plan-realiser is different: the plan-realiser, in the absence of detailed planning instructions, must perform some tasks (such as the selection and ordering of content) which are more traditionally performed by a planner. This work largely concerns the development of the plan- realiser and its interface with the planner. Another aspect of the solution is the use of Abstract Transformations which act on the SemNet input before realisation leading to an increased ability for creating paraphrases. The research has lead to a practical working solution which has greatly increased the power of the LOLITA system. The research also investigates how NLG systems can be evaluated and the advantages and disadvantages of using a functional language for the generation task. 005 Computational linguistics
10	Managing surface ambiguity in the generation of referring expressions Khan, Imtiaz Hussain January 2010 (has links) Managing Surface Ambiguity in the Generation of Referring Expressions (Imtiaz Hussain Khan) Most algorithms for the Generation of Referring Expressions tend to generate distinguishing descriptions at the semantic level, disregarding the ways in which surface issues can affect their quality. This thesis explores the role of surface ambiguities in referring expressions and how the risk of such ambiguities should be taken into account by an algorithm that generates referring expressions. This was done by focussing on the type of surface ambiguity which arises when adjectives occur in coordinated structures (as in the old men and women). The central idea is to use statistical information about lexical co-occurrence to estimate which interpretation of a phrase is most likely for human readers, and to avoid generating phrases where misunderstandings are likely. We develop specific hypotheses, and test them by running experiments with human participants. We found that the Word Sketches are a reliable source of information to predict the likelihood of a reading. The avoidance of misunderstandings is not the only issue to be dealt with in this thesis. Since the avoidance of misunderstandings might be achieved at the cost of very lengthy (or perhaps very disfluent) expressions, it is important to select an optimal expression (i.e., the expression which is preferred by most readers) from various alternatives available. Again, we develop specific hypotheses, and recorded human preferences in a forced-choice manner. We found that participants preferred clear (i.e., not likely to be misunderstood) expressions to unclear ones, but if several of the expressions were clear then brief expressions were preferred over their longer counterparts. The results of these empirical studies motivated the design of a GRE algorithm. The implemented algorithm builds a plural distinguishing description for the intended referents (if one exists), using words; applies transformation rules to the distinguishing description to construct a set of distinguishing descriptions that are logically equivalent. Each description in the set is realised as a corresponding English noun phrase (NP) using appropriate realisation rules; the most likely reading of each NP is determined. One NP is selected for output. A further experiment verifies that the kinds of expressions produced by the algorithm are optimal for readers: they are understood accurately and quickly by readers. 005.3 Computational linguistics : Algorithms

Search results