Global ETD Search

141	Parsing and Generating English Using Commutative Transformations Katz, Boris, Winston, Patrick H. 01 May 1982 (has links) This paper is about an implemented natural language interface that translates from English into semantic net relations and from semantic net relations back into English. The parser and companion generator were implemented for two reasons: (a) to enable experimental work in support of a theory of learning by analogy; (b) to demonstrate the viability of a theory of parsing and generation built on commutative transformations. The learning theory was shaped to a great degree by experiments that would have been extraordinarily tedious to perform without the English interface with which the experimental data base was prepared, revise, and revised again. Inasmuch as current work on the learning theory is moving toward a tenfold increase in data-base size, the English interface is moving from a facilitating role to an enabling one. The parsing and generation theory has two particularly important features: (a) the same grammar is used for both parsing and generation; (b) the transformations of the grammar are commutative. The language generation procedure converts a semantic network fragment into kernel frames, chooses the set of transformations that should be performed upon each frame, executes the specified transformations, combines the altered kernels into a sentence, performs a pronominalization process, and finally produces the appropriate English word string. Parsing is essentially the reverse of generation. The first step in the parsing process is splitting a given sentence into a set of kernel clauses along with a description of how those clauses hierarchically related to each other. The clauses are hierarchically related to each other. The clauses are used to produce a matrix embedded kernel frames, which in turn supply arguments to relation-creating functions. The evaluation of the relation-creating functions results in the construction of the semantic net fragments. parsing generation natural language semantic networks scommutative transformations language understanding
142	Causal Reconstruction Borchardt, Gary C. 01 February 1993 (has links) Causal reconstruction is the task of reading a written causal description of a physical behavior, forming an internal model of the described activity, and demonstrating comprehension through question answering. T his task is difficult because written d escriptions often do not specify exactly how r eferenced events fit together. This article (1) ch aracterizes the causal reconstruction problem, (2) presents a representation called transition space, which portrays events in terms of "transitions,'' or collections of changes expressible in everyday language, and (3) describes a program called PATHFINDER, which uses the transition space representation to perform causal reconstruction on simplified English descriptions of physical activity. knowledge representation explanation causal reasoning sanalogy abstraction natural language
143	Computational Structure of GPSG Models: Revised Generalized Phrase Structure Grammar Ristad, Eric Sven 01 September 1989 (has links) The primary goal of this report is to demonstrate how considerations from computational complexity theory can inform grammatical theorizing. To this end, generalized phrase structure grammar (GPSG) linguistic theory is revised so that its power more closely matches the limited ability of an ideal speaker--hearer: GPSG Recognition is EXP-POLY time hard, while Revised GPSG Recognition is NP-complete. A second goal is to provide a theoretical framework within which to better understand the wide range of existing GPSG models, embodied in formal definitions as well as in implemented computer programs. A grammar for English and an informal explanation of the GPSG/RGPSG syntactic features are included in appendices. linguistics complexity GPSG natural language scomputational structure computational complexity
144	Using Analogy to Acquire Commonsense Knowledge from Human Contributors Chklovski, Timothy 12 February 2003 (has links) The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical. AI knowledge acquisition knowledge capture analogy natural language reasoning
145	Complexity of Human Language Comprehension Ristad, Eric Sven 01 December 1988 (has links) The goal of this article is to reveal the computational structure of modern principle-and-parameter (Chomskian) linguistic theories: what computational problems do these informal theories pose, and what is the underlying structure of those computations? To do this, I analyze the computational complexity of human language comprehension: what linguistic representation is assigned to a given sound? This problem is factored into smaller, interrelated (but independently statable) problems. For example, in order to understand a given sound, the listener must assign a phonetic form to the sound; determine the morphemes that compose the words in the sound; and calculate the linguistic antecedent of every pronoun in the utterance. I prove that these and other subproblems are all NP-hard, and that language comprehension is itself PSPACE-hard. linguistic theory natural language computational complexity government-binding phonology syntax
146	GeneTUC: Natural Language Understanding in Medical Text Sætre, Rune January 2006 (has links) Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists. The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems. The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities. The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis. Information Extraction (IE) Natural Language Processing (NLP) Bio-informatics
147	Automatic Supervised Thesauri Construction with Roget’s Thesaurus Kennedy, Alistair H 07 December 2012 (has links) Thesauri are important tools for many Natural Language Processing applications. Roget's Thesaurus is particularly useful. It is of high quality and has been in development for over a century and a half. Yet its applications have been limited, largely because the only publicly available edition dates from 1911. This thesis proposes and tests methods of automatically updating the vocabulary of the 1911 Roget’s Thesaurus. I use the Thesaurus as a source of training data in order to learn from Roget’s for the purpose of updating Roget’s. The lexicon is updated in two stages. First, I develop a measure of semantic relatedness that enhances existing distributional techniques. I improve existing methods by using known sets of synonyms from Roget’s to train a distributional measure to better identify near synonyms. Second, I use the new measure of semantic relatedness to find where in Roget’s to place a new word. Existing words from Roget’s are used as training data to tune the parameters of three methods of inserting words. Over 5000 new words and word-senses were added using this process. I conduct two kinds of evaluation on the updated Thesaurus. One is on the procedure for updating Roget’s. This is accomplished by removing some words from the Thesaurus and testing my system's ability to reinsert them in the correct location. Human evaluation of the newly added words is also performed. Annotators must determine whether a newly added word is in the correct location. They found that in most cases the new words were almost indistinguishable from those already existing in Roget's Thesaurus. The second kind of evaluation is to establish the usefulness of the updated Roget’s Thesaurus on actual Natural Language Processing applications. These applications include determining semantic relatedness between word pairs or sentence pairs, identifying the best synonym from a set of candidates, solving SAT-style analogy problems, pseudo-word-sense disambiguation, and sentence ranking for text summarization. The updated Thesaurus consistently performed at least as well or better the original Thesaurus on all these applications. Roget's Thesaurus Natural Language Processing Distributional Semantics Thesauri construction
148	Corpus construction based on Ontological domain knowledge Benis, Nirupama, Kaliyaperumal, Rajaram January 2011 (has links) The purpose of this thesis is to contribute a corpus for sentence level interpretation of biomedical language. The available corpora for the biomedical domain are small in terms of amount of text and predicates. Besides that these corpora are developed rather intuitively. In this effort which we call BioOntoFN, we created a corpus from the domain knowledge provided by an ontology. By doing this we believe that we can provide a rough set of rules to create corpora from ontologies. Besides that we also designed an annotation tool specifically for building our corpus. We built a corpus for biological transport events. The ontology we used is the piece of Gene Ontology pertaining to transport, the term transport GO: 0006810 and all of its child concepts, which could be called a sub-ontology. The annotation of the corpus follows the rules of FrameNet and the output is annotated text that is in an XML format similar to that of FrameNet. The text for the corpus is taken from abstracts of MEDLINE articles. The annotation tool is a GUI created using Java. Text mining Biomedical text mining Natural Language Processing
149	Using Rhetorical Figures and Shallow Attributes as a Metric of Intent in Text Strommer, Claus Walter January 2011 (has links) In this thesis we propose a novel metric of document intent evaluation based on the detection and classification of rhetorical figure. In doing so we dispel the notion that rhetoric lacks the structure and consistency necessary to be relevant to computational linguistics. We show how the combination of document attributes available through shallow parsing and rules extracted from the definitions of rhetorical figures produce a metric which can be used to reliably classify the intent of texts. This metric works equally well on entire documents as on portions of a document. rhetoric classification natural language processing computational linguistics epanaphora Computer Science
150	Generating natural language text in response to questions about database structure / McKeown, Kathleen R. January 1900 (has links) Thesis (Ph. D.)--University of Pennsylvania, 1982. / Cover title. Includes bibliographical references and index.

Search results