Spelling suggestions: "subject:"NLP (batural anguage aprocessing)"" "subject:"NLP (batural anguage eprocessing)""
1 |
The use of systems engineering principles for the integration of existing models and simulationsLuff, Robert January 2017 (has links)
With the rise in computational power, the prospect of simulating a complex engineering system with a high degree of accuracy and in a meaningful way is becoming a real possibility. Modelling and simulation have become ubiquitous throughout the engineering life cycle, as a consequence there are many thousands of existing models and simulations that are potential candidates for integration. This work is concerned with ascertaining if systems engineering principles are of use in the support of virtual testing, from desire to test, designing experiments, specifying simulations, selecting models and simulations, integrating component parts, verifying that the work is as specified, and validating that any outcomes are meaningful. A novel representation of systems engineering framework is proposed and forms the bases for the methods that were developed. It takes the core systems engineering principles and expresses them in a way that can be implemented in a variety of ways. An end to end process for virtual testing with the potential to use existing models and simulations is proposed, it provides structure and order to the testing task. A key part of the proposed process is the recognition that models and simulations requirements are different from those of the system being designed, and hence a modelling and simulation specific writing guide is produced. The automation of any engineering task has the potential to reduce the time to market of the final product, for this reason the potential of natural language processing technology to hasten the proposed processes was investigated. Two case studies were selected to test and demonstrate the potential of the novel approach, the first being an investigation into material selection for a squash ball, and the second being automotive in nature concerned with combining steering and braking systems. The processes and methods indicated their potential value, especially in the automotive case study where inconsistences were identified that could have otherwise affected the successful integration. This capability, combined with the verification stages, improves the confidence of any model and simulation integration. The NLP proof of concept software also demonstrated that such technology has value in the automation of integration. With further testing and development there is the possibility to create a software package to guide engineers through the difficult task of virtual testing. Such a tool would have the potential to drastically reduce the time to market of complex products.
|
2 |
Knowledge acquisition from user reviews for interactive question answeringKonstantinova, Natalia January 2013 (has links)
Nowadays, the effective management of information is extremely important for all spheres of our lives and applications such as search engines and question answering systems help users to find the information that they need. However, even when assisted by these various applications, people sometimes struggle to find what they want. For example, when choosing a product customers can be confused by the need to consider many features before they can reach a decision. Interactive question answering (IQA) systems can help customers in this process, by answering questions about products and initiating a dialogue with the customers when their needs are not clearly defined. The focus of this thesis is how to design an interactive question answering system that will assist users in choosing a product they are looking for, in an optimal way, when a large number of similar products are available. Such an IQA system will be based on selecting a set of characteristics (also referred to as product features in this thesis), that describe the relevant product, and narrowing the search space. We believe that the order in which these characteristics are presented in terms of these IQA sessions is of high importance. Therefore, they need to be ranked in order to have a dialogue which selects the product in an efficient manner. The research question investigated in this thesis is whether product characteristics mentioned in user reviews are important for a person who is likely to purchase a product and can therefore be used when designing an IQA system. We focus our attention on products such as mobile phones; however, the proposed techniques can be adapted for other types of products if the data is available. Methods from natural language processing (NLP) fields such as coreference resolution, relation extraction and opinion mining are combined to produce various rankings of phone features. The research presented in this thesis employs two corpora which contain texts related to mobile phones specifically collected for this thesis: a corpus of Wikipedia articles about mobile phones and a corpus of mobile phone reviews published on the Epinions.com website. Parts of these corpora were manually annotated with coreference relations, mobile phone features and relations between mentions of the phone and its features. The annotation is used to develop a coreference resolution module as well as a machine learning-based relation extractor. Rule-based methods for identification of coreference chains describing the phone are designed and thoroughly evaluated against the annotated gold standard. Machine learning is used to find links between mentions of the phone (identified by coreference resolution) and phone features. It determines whether some phone feature belong to the phone mentioned in the same sentence or not. In order to find the best rankings, this thesis investigates several settings. One of the hypotheses tested here is that the relatively low results of the proposed baseline are caused by noise introduced by sentences which are not directly related to the phone and phone feature. To test this hypothesis, only sentences which contained mentions of the mobile phone and a phone feature linked to it were processed to produce rankings of the phones features. Selection of the relevant sentences is based on the results of coreference resolution and relation extraction. Another hypothesis is that opinionated sentences are a good source for ranking the phone features. In order to investigate this, a sentiment classification system is also employed to distinguish between features mentioned in positive and negative contexts. The detailed evaluation and error analysis of the methods proposed form an important part of this research and ensure that the results provided in this thesis are reliable.
|
3 |
Enfrentamento do problema das divergências de tradução por um sistema de tradução automática : um exercício exploratório /Oliveira, Mirna Fernanda de. January 2006 (has links)
Orientador: Bento Carlos Dias da Silva / Banca: Beatriz Nunes de Oliveira Longo / Banca: Dirce Charara Monteiro / Banca: Gladis Maria de Barcellos Almeida / Banca: Heronides Maurílio de Melo Moura / Resumo: O objetivo desta tese é desenvolver um estudo lingüístico-computacional exploratório de um problema específico que deve ser enfrentado por sistemas de tradução automática: o problema da divergências de tradução quer de natureza sintática quer de natureza léxico-semântica que se verificam entre pares de sentenças de línguas naturais diferentes. Para isso, fundamenta-se na metodologia de pesquisa interdisciplinar em PLN (Processamento Automático de Línguas Naturais) de Dias-da-Silva (1996, 1998 e 2003) e na teoria lingüístico-computacional subjacente ao sistema de tradução automática UNITRAN de Dorr (1993), que, por sua vez é subsidiado pela teoria sintática dos princípios e Parâmetros de Chomsky (1981) e pela teoria semântica das Estruturas conceituais de Jackendoff (1990). Como contribuição, a tese descreve a composição e o funcionamento do UNITRAN, desenhado para dar conta de parte do problema posto pelas divergências de tradução e ilustra a possibilidade de inclusão do português nesse sistema através do exame de alguns tipos de divergências que se verificam entre frases do inglês e do português. / Abstract: This dissertation aims to develop an exploratory linguistic and computational study of an especific type of problem that must be faced by machine translation systems: the problem of translation divergences, whether syntactic or lexical-semantic ones that can be verified between distinct natural language sentence. In order to achieve this aim, this work is based on the interdisciplinary research metodology of the NLP (Natural Language Processing) field developed by Dias-da-Silva (1996, 1998 & 2003) and on the linguistic computacional theory behind UNITRAN, a machine translation systemdeveloped by Dorr (1993), a system that is on its turned based on Chomsky's syntactic theory of Government and Binding (1981) and Jackendoff's semantic theory of Conceptual Structures (1990). As a contribution to the field of NLP, this dissertation describes the machinery of UNITRAN, designed to deal with part of the problem of translation divergencies, and it illustrates the possibility of including Brazilian Portuguese language in the system through the investigation of certain kinds of divergences that can be found between English and Brazilian Portuguese senteces. / Doutor
|
4 |
ENHANCING ELECTRONIC HEALTH RECORDS SYSTEMS AND DIAGNOSTIC DECISION SUPPORT SYSTEMS WITH LARGE LANGUAGE MODELSFurqan Ali Khan (19203916) 26 July 2024 (has links)
<p dir="ltr">Within Electronic Health Record (EHR) Systems, physicians face extensive documentation, leading to alarming mental burnout. The disproportionate focus on data entry over direct patient care underscores a critical concern. Integration of Natural Language Processing (NLP) powered EHR systems offers relief by reducing time and effort in record maintenance.</p><p dir="ltr">Our research introduces the Automated Electronic Health Record System, which not only transcribes dialogues but also employs advanced clinical text classification. With an accuracy exceeding 98.97%, it saves over 90% of time compared to manual entry, as validated on MIMIC III and MIMIC IV datasets.</p><p dir="ltr">In addition to our system's advancements, we explore integration of Diagnostic Decision Support System (DDSS) leveraging Large Language Models (LLMs) and transformers, aiming to refine healthcare documentation and improve clinical decision-making. We explore the advantages, like enhanced accuracy and contextual understanding, as well as the challenges, including computational demands and biases, of using various LLMs.</p>
|
5 |
Data-driven language understanding for spoken dialogue systemsMrkšić, Nikola January 2018 (has links)
Spoken dialogue systems provide a natural conversational interface to computer applications. In recent years, the substantial improvements in the performance of speech recognition engines have helped shift the research focus to the next component of the dialogue system pipeline: the one in charge of language understanding. The role of this module is to translate user inputs into accurate representations of the user goal in the form that can be used by the system to interact with the underlying application. The challenges include the modelling of linguistic variation, speech recognition errors and the effects of dialogue context. Recently, the focus of language understanding research has moved to making use of word embeddings induced from large textual corpora using unsupervised methods. The work presented in this thesis demonstrates how these methods can be adapted to overcome the limitations of language understanding pipelines currently used in spoken dialogue systems. The thesis starts with a discussion of the pros and cons of language understanding models used in modern dialogue systems. Most models in use today are based on the delexicalisation paradigm, where exact string matching supplemented by a list of domain-specific rephrasings is used to recognise users' intents and update the system's internal belief state. This is followed by an attempt to use pretrained word vector collections to automatically induce domain-specific semantic lexicons, which are typically hand-crafted to handle lexical variation and account for a plethora of system failure modes. The results highlight the deficiencies of distributional word vectors which must be overcome to make them useful for downstream language understanding models. The thesis next shifts focus to overcoming the language understanding models' dependency on semantic lexicons. To achieve that, the proposed Neural Belief Tracking (NBT) model forsakes the use of standard one-hot n-gram representations used in Natural Language Processing in favour of distributed representations of user utterances, dialogue context and domain ontologies. The NBT model makes use of external lexical knowledge embedded in semantically specialised word vectors, obviating the need for domain-specific semantic lexicons. Subsequent work focuses on semantic specialisation, presenting an efficient method for injecting external lexical knowledge into word vector spaces. The proposed Attract-Repel algorithm boosts the semantic content of existing word vectors while simultaneously inducing high-quality cross-lingual word vector spaces. Finally, NBT models powered by specialised cross-lingual word vectors are used to train multilingual belief tracking models. These models operate across many languages at once, providing an efficient method for bootstrapping language understanding models for lower-resource languages with limited training data.
|
6 |
Enfrentamento do problema das divergências de tradução por um sistema de tradução automática: um exercício exploratórioOliveira, Mirna Fernanda de [UNESP] 25 April 2006 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:32:47Z (GMT). No. of bitstreams: 0
Previous issue date: 2006-04-25Bitstream added on 2014-06-13T20:43:58Z : No. of bitstreams: 1
oliveira_mf_dr_ararafcl.pdf: 631650 bytes, checksum: fa4233637c661c5e993adcc08801d158 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O objetivo desta tese é desenvolver um estudo lingüístico-computacional exploratório de um problema específico que deve ser enfrentado por sistemas de tradução automática: o problema da divergências de tradução quer de natureza sintática quer de natureza léxico-semântica que se verificam entre pares de sentenças de línguas naturais diferentes. Para isso, fundamenta-se na metodologia de pesquisa interdisciplinar em PLN (Processamento Automático de Línguas Naturais) de Dias-da-Silva (1996, 1998 e 2003) e na teoria lingüístico-computacional subjacente ao sistema de tradução automática UNITRAN de Dorr (1993), que, por sua vez é subsidiado pela teoria sintática dos princípios e Parâmetros de Chomsky (1981) e pela teoria semântica das Estruturas conceituais de Jackendoff (1990). Como contribuição, a tese descreve a composição e o funcionamento do UNITRAN, desenhado para dar conta de parte do problema posto pelas divergências de tradução e ilustra a possibilidade de inclusão do português nesse sistema através do exame de alguns tipos de divergências que se verificam entre frases do inglês e do português. / This dissertation aims to develop an exploratory linguistic and computational study of an especific type of problem that must be faced by machine translation systems: the problem of translation divergences, whether syntactic or lexical-semantic ones that can be verified between distinct natural language sentence. In order to achieve this aim, this work is based on the interdisciplinary research metodology of the NLP (Natural Language Processing) field developed by Dias-da-Silva (1996, 1998 & 2003) and on the linguistic computacional theory behind UNITRAN, a machine translation systemdeveloped by Dorr (1993), a system that is on its turned based on Chomsky's syntactic theory of Government and Binding (1981) and Jackendoff's semantic theory of Conceptual Structures (1990). As a contribution to the field of NLP, this dissertation describes the machinery of UNITRAN, designed to deal with part of the problem of translation divergencies, and it illustrates the possibility of including Brazilian Portuguese language in the system through the investigation of certain kinds of divergences that can be found between English and Brazilian Portuguese senteces.
|
7 |
Databáze XML pro správu slovníkových dat / XML Databases for Dictionary Data ManagementSamia, Michel January 2011 (has links)
The following diploma thesis deals with dictionary data processing, especially those in XML based formats. At first, the reader is acquainted with linguistic and lexicographical terms used in this work. Then particular lexicographical data format types and specific formats are introduced. Their advantages and disadvantages are discussed as well. According to previously set criteria, the LMF format has been chosen for design and implementation of Python application, which focuses especially on intelligent merging of more dictionaries into one. After passing all unit tests, this application has been used for processing LMF dictionaries, located on the faculty server of the research group for natural language processing. Finally, the advantages and disadvantages of this application are discussed and ways of further usage and extension are suggested.
|
8 |
Génération de données synthétiques pour l'adaptation hors-domaine non-supervisée en réponse aux questions : méthodes basées sur des règles contre réseaux de neuronesDuran, Juan Felipe 02 1900 (has links)
Les modèles de réponse aux questions ont montré des résultats impressionnants sur plusieurs ensembles de données et tâches de réponse aux questions. Cependant, lorsqu'ils sont testés sur des ensembles de données hors domaine, la performance diminue. Afin de contourner l'annotation manuelle des données d'entraînement du nouveau domaine, des paires de questions-réponses peuvent être générées synthétiquement à partir de données non annotées. Dans ce travail, nous nous intéressons à la génération de données synthétiques et nous testons différentes méthodes de traitement du langage naturel pour les deux étapes de création d'ensembles de données : génération de questions et génération de réponses. Nous utilisons les ensembles de données générés pour entraîner les modèles UnifiedQA et Bert-QA et nous les testons sur SCIQ, un ensemble de données hors domaine sur la physique, la chimie et la biologie pour la tâche de question-réponse à choix multiples, ainsi que sur HotpotQA, TriviaQA, NatQ et SearchQA, quatre ensembles de données hors domaine pour la tâche de question-réponse. Cette procédure nous permet d'évaluer et de comparer les méthodes basées sur des règles avec les méthodes de réseaux neuronaux. Nous montrons que les méthodes basées sur des règles produisent des résultats supérieurs pour la tâche de question-réponse à choix multiple, mais que les méthodes de réseaux neuronaux produisent généralement des meilleurs résultats pour la tâche de question-réponse. Par contre, nous observons aussi qu'occasionnellement, les méthodes basées sur des règles peuvent compléter les méthodes de réseaux neuronaux et produire des résultats compétitifs lorsqu'on entraîne Bert-QA avec les bases de données synthétiques provenant des deux méthodes. / Question Answering models have shown impressive results in several question answering datasets and tasks. However, when tested on out-of-domain datasets, the performance decreases. In order to circumvent manually annotating training data from the new domain, question-answer pairs can be generated synthetically from unnanotated data. In this work, we are interested in the generation of synthetic data and we test different Natural Language Processing methods for the two steps of dataset creation: question/answer generation. We use the generated datasets to train QA models UnifiedQA and Bert-QA and we test it on SCIQ, an out-of-domain dataset about physics, chemistry, and biology for MCQA, and on HotpotQA, TriviaQA, NatQ and SearchQA, four out-of-domain datasets for QA. This procedure allows us to evaluate and compare rule-based methods with neural network methods. We show that rule-based methods yield superior results for the multiple-choice question-answering task, but neural network methods generally produce better results for the question-answering task. However, we also observe that occasionally, rule-based methods can complement neural network methods and produce competitive results when training Bert-QA with synthetic databases derived from both methods.
|
Page generated in 0.1004 seconds