Global ETD Search

141	A principle-based system for natural language analysis and translation Crocker, Matthew Walter January 1988 (has links) Traditional views of grammatical theory hold that languages are characterised by sets of constructions. This approach entails the enumeration of all possible constructions for each language being described. Current theories of transformational generative grammar have established an alternative position. Specifically, Chomsky's Government-Binding theory proposes a system of principles which are common to human language. Such a theory is referred to as a "Universal Grammar"(UG). Associated with the principles of grammar are parameters of variation which account for the diversity of human languages. The grammar for a particular language is known as a "Core Grammar", and is characterised by an appropriately parametrised instance of UG. Despite these advances in linguistic theory, construction-based approaches have remained the status quo within the field of natural language processing. This thesis investigates the possibility of developing a principle-based system which reflects the modular nature of the linguistic theory. That is, rather than stipulating the possible constructions of a language, a system is developed which uses the principles of grammar and language specific parameters to parse language. Specifically, a system-is presented which performs syntactic analysis and translation for a subset of English and German. The cross-linguistic nature of the theory is reflected by the system which can be considered a procedural model of UG. / Science, Faculty of / Computer Science, Department of / Graduate Machine translating Government-binding theory (Linguistics)
142	Improving Eligibility Prescreening for Alzheimer’s Disease and Related Dementias Clinical Trials with Natural Language Processing Idnay, Betina Ross Saldua January 2022 (has links) Alzheimer’s disease and related dementias (ADRD) are among the leading causes of disability and mortality among the older population worldwide and a costly public health issue, yet there is still no treatment for prevention or cure. Clinical trials are available, but successful recruitment has been a longstanding challenge. One strategy to improve recruitment is conducting eligibility prescreening, a resource-intensive process where clinical research staff manually go through electronic health records to identify potentially eligible patients. Natural language processing (NLP), an informatics approach used to extract relevant data from various structured and unstructured data types, may improve eligibility prescreening for ADRD clinical trials. Guided by the Fit between Individuals, Task, and Technology framework, this dissertation research aims to optimize eligibility prescreening for ADRD clinical research by evaluating the sociotechnical factors influencing the adoption of NLP-driven tools. A systematic review of the literature was done to identify NLP systems that have been used for eligibility prescreening in clinical research. Following this, three NLP-driven tools were evaluated in ADRD clinical research eligibility prescreening: Criteria2Query, i2b2, and Leaf. We conducted an iterative mixed-methods usability evaluation with twenty clinical research staff using a cognitive walkthrough with a think-aloud protocol, Post-Study System Usability Questionnaire, and a directed deductive content analysis. Moreover, we conducted a cognitive task analysis with sixty clinical research staff to assess the impact of cognitive complexity on the usability of NLP systems and identify the sociotechnical gaps and cognitive support needed in using NLP systems for ADRD clinical research eligibility prescreening. The results show that understanding the role of NLP systems in improving eligibility prescreening is critical to the advancement of clinical research recruitment. All three systems are generally usable and accepted by a group of clinical research staff. The cognitive walkthrough and a think-aloud protocol informed iterative system refinement, resulting in high system usability. Cognitive complexity has no significant effect on system usability; however, the system, order of evaluation, job position, and computer literacy are associated with system usability. Key recommendations for system development and implementation include improving system intuitiveness and overall user experience through comprehensive consideration of user needs and task completion requirements; and implementing a focused training on database query to improve clinical research staff’s aptitude in eligibility prescreening and advance workforce competency. Finally, this study contributes to our understanding of the conduct of electronic eligibility prescreening for ADRD clinical research by clinical research staff. Findings from this study highlighted the importance of leveraging human-computer collaboration in conducting eligibility prescreening using NLP-driven tools, which provide an opportunity to identify and enroll participants of diverse backgrounds who are eligible for ADRD clinical research and accelerate treatment development. Nursing Medical screening Alzheimer's disease--Diagnosis Dementia--Diagnosis
143	Toward Annotation Efficiency in Biased Learning Settings for Natural Language Processing Effland, Thomas January 2023 (has links) The goal of this thesis is to improve the feasibility of building applied NLP systems for more diverse and niche real-world use-cases of extracting structured information from text. A core factor in determining this feasibility is the cost of manually annotating enough unbiased labeled data to achieve a desired level of system accuracy, and our goal is to reduce this cost. We focus on reducing this cost by making contributions in two directions: (1) easing the annotation burden by leveraging high-level expert knowledge in addition to labeled examples, thus making approaches more annotation-efficient; and (2) mitigating known biases in cheaper, imperfectly labeled real-world datasets so that we may use them to our advantage. A central theme of this thesis is that high-level expert knowledge about the data and task can allow for biased labeling processes that focus experts on only manually labeling aspects of the data that cannot be easily labeled through cheaper means. This combination allows for more accurate models with less human effort. We conduct our research on this general topic through three diverse problems with immediate applications to real-world settings. First, we study an applied problem in biased text classification. We encounter a rare-event text classification system that has been deployed for several years. We are tasked with improving this system's performance using only the severely biased incidental feedback provided by the experts over years of system use. We develop a method that combines importance weighting and an unlabeled data imputation scheme that exploits the selection-bias of the feedback to train an unbiased classifier without requiring additional labeled data. We experimentally demonstrate that this method considerably improves the system performance. Second, we tackle an applied problem in named entity recognition (NER) concerning learning tagging models from data that have very low recall for annotated entities. To solve this issue we propose a novel loss, the Expected Entity Ratio (EER), that uses an uncertain estimate of the proportion of entities in the data to counteract the false-negative bias in the data, encouraging the model to have the correct ratio of entities in expectation. We justify the principles of our approach by providing theory that shows it recovers the true tagging distribution under mild conditions. Additionally we provide extensive empirical results that show it to be practically useful. Empirically, we find that it meets or exceeds performance of state-of-the-art baselines across a variety of languages, annotation scenarios, and amounts of labeled data. We also show that, when combined with our approach, a novel sparse annotation scheme can outperform exhaustive annotation for modest annotation budgets. Third, we study the challenging problem of syntactic parsing in low-resource languages. We approach the problem from a cross-lingual perspective, building on a state-of-the-art transfer-learning approach that underperforms on ``distant'' languages that have little to no representation in the training corpus. Motivated by the field of syntactic typology, we introduce a general method called Expected Statistic Regularization (ESR) to regularize the parser on distant languages according to their expected typological syntax statistics. We also contribute general approaches for estimating the loss supervision parameters from the task formalism or small amounts of labeled data. We present seven broad classes of descriptive statistic families and provide extensive experimental evidence showing that using these statistics for regularization is complementary to deep learning approaches in low-resource transfer settings. In conclusion, this thesis contributes approaches for reducing the annotation cost of building applied NLP systems through the use of high-level expert knowledge to impart additional learning signal on models and cope with cheaper biased data. We publish implementations of our methods and results, so that they may facilitate future research and applications. It is our hope that the frameworks proposed in this thesis will help to democratize access to NLP for producing structured information from text in wider-reaching applications by making them faster and cheaper to build. Computer science Statistics Artificial intelligence Grammar, Comparative and general--Syntax
144	Domain-informed Language Models for Process Systems Engineering Mann, Vipul January 2024 (has links) Process systems engineering (PSE) involves a systems-level approach to solving problems in chemical engineering related to process modeling, design, control, and optimization and involves modeling interactions between various systems (and subsystems) governing the process. This requires using a combination of mathematical methods, physical intuition, and recently machine learning techniques. Recently, language models have seen tremendous advances due to new and more efficient model architectures (such as transformers), computing power, and large volumes of training data. Many of these language models could be appropriately adapted to solve several PSE-related problems. However, language models are inherently complex and are often characterized by several million parameters, which could only be trained efficiently in data-rich areas, unlike PSE. Moreover, PSE is characterized by decades of rich process knowledge that must be utilized during model training to avoid mismatch between process knowledge and data-driven language models. This thesis presents a framework for building domain-informed language models for several central problems in PSE spanning multiple scales. Specifically, the frameworks presented include molecular property prediction, forward and retrosynthesis reaction outcome prediction, chemical flowsheet representation and generation, pharmaceutical information extraction, and reaction classification. Domain knowledge is integrated with language models using custom model architectures, standard and custom-built ontologies, linguistics-inspired chemistry and process flowsheet grammar, adapted problem formulations, graph theory techniques, and so on. This thesis is intended to provide a path for future developments of domain-informed language models in process systems engineering that respect domain knowledge, but leverage their computational advantages. Chemical engineering Systems engineering Machine learning
145	A temporal analysis of natural language narrative text Ramachandran, Venkateshwaran 12 March 2009 (has links) Written English texts in the form of narratives often describe events that occur in definite chronological sequence. Understanding the concept of time in such texts is an essential aspect of text comprehension and forms the basis for answering time related questions pertaining to the source text. It is our hypothesis that time in such texts is expressed in terms of temporal orderings of the situations described and can be modelled by a linear representation of these situations. This representation conforms to the traditional view of the linearity of time where it is regarded as a horizontal line called the timeline. Information indicating the temporal ordering of events is often explicitly specified in the source text. Where such indicators are missing, semantic relations between the events enforce temporal orderings. This thesis proposes and implements a practical model for automatically processing paragraphs of narrative fiction for explicit chronological information and employing certain guidelines for inferring such information in the absence of explicit indications. Although we cannot claim to have altogether eliminated the need for expensive semantic inferencing within our model, we have certainly devised guidelines to eliminate the expense in certain cases where explicit temporal indicators are missing. We have also characterized some cases through our test data where semantic inferencing proves necessary to augment the capabilities of our model. / Master of Science LD5655.V855 1990.R352
146	Natural language interface to a VHDL modeling tool Manek, Meenakshi 23 June 2009 (has links) This thesis describes a Natural Language (NL) interface to a VHDL modeling tool called the Modeler's Assistant. The primary motivation for the interface developed in this research work is to permit VLSI modelers who are not proficient in VHDL to rapidly produce correct VHDL models from manufacturer's descriptions. This tool should also be useful in teaching the VHDL language. The Modeler's Assistant has supported graphical capture of behavioral models in the form of Process Model Graphs consisting of processes (nodes) interconnected by signals (arcs). The NL interface that has been constructed allows modelers to specify the behavior for the process nodes using a restricted form of English called ModelSpeak. A Spell-checking routine (of the UNIX operating system) is invoked to reduce input errors. Also, the grammar employed, accepts multi-sentence descriptions rather than just a single sentence. Correct VHDL for each process is synthesized automatically, but user interaction is solicited where needed to resolve ambiguities such as the scope of loops and the type of signals and variables. The Modeler's Assistant can then assemble the VHDL code for these processes, along with the information about the interface description from the PMG, into a complete entity model. / Master of Science LD5655.V855 1993.M263
147	Larger-first partial parsing Van Delden, Sebastian Alexander 01 January 2003 (has links) (PDF) Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail. Electrical and Computer Engineering Engineering
148	Category-theoretic quantitative compositional distributional models of natural language semantics Grefenstette, Edward Thomas January 2013 (has links) This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce distributional representations for larger units of text (such as a verb and its arguments) by composing the distributional representations of smaller units of text (such as individual words). This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research. 006.3
149	Computers and Natural Language: Will They Find Happiness Together? Prall, James W. January 1985 (has links) Permission from the author to release this work as open access is pending. Please contact the ICS library if you would like to view this work. Interactive computer systems Question-answering systems Computer science
150	Role of description logic reasoning in ontology matching Reul, Quentin H. January 2012 (has links) Semantic interoperability is essential on the Semantic Web to enable different information systems to exchange data. Ontology matching has been recognised as a means to achieve semantic interoperability on the Web by identifying similar information in heterogeneous ontologies. Existing ontology matching approaches have two major limitations. The first limitation relates to similarity metrics, which provide a pessimistic value when considering complex objects such as strings and conceptual entities. The second limitation relates to the role of description logic reasoning. In particular, most approaches disregard implicit information about entities as a source of background knowledge. In this thesis, we first present a new similarity function, called the degree of commonality coefficient, to compute the overlap between two sets based on the similarity between their elements. The results of our evaluations show that the degree of commonality performs better than traditional set similarity metrics in the ontology matching task. Secondly, we have developed the Knowledge Organisation System Implicit Mapping (KOSIMap) framework, which differs from existing approaches by using description logic reasoning (i) to extract implicit information as background knowledge for every entity, and (ii) to remove inappropriate correspondences from an alignment. The results of our evaluation show that the use of Description Logic in the ontology matching task can increase coverage. We identify people interested in ontology matching and reasoning techniques as the target audience of this work 025.0427

Search results