• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 103
  • 8
  • 5
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 153
  • 153
  • 73
  • 61
  • 53
  • 52
  • 44
  • 39
  • 36
  • 29
  • 26
  • 26
  • 20
  • 17
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Syntax-based Concept Extraction For Question Answering

Glinos, Demetrios 01 January 2006 (has links)
Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems.
82

Leveraging Large Language Models Trained on Code for Symbol Binding

Robinson, Joshua 09 August 2022 (has links) (PDF)
While large language models like GPT-3 have achieved impressive results in the zero-, one-, and few-shot settings, they still significantly underperform on some tasks relative to the state of the art (SOTA). For many tasks it would be useful to have answer options explicitly listed out in a multiple choice format, decreasing computational cost and allowing the model to reason about the relative merits of possible answers. We argue that the reason this hasn't helped models like GPT-3 close the gap with the SOTA is that these models struggle with symbol binding - associating each answer option with a symbol that represents it. To ameliorate this situation we introduce index prompting, a way of leveraging language models trained on code to successfully answer multiple choice formatted questions. When used with the OpenAI Codex model, our method improves accuracy by about 18% on average in the few-shot setting relative to GPT-3 across 8 datasets representing 4 common NLP tasks. It also achieves a new single-model state of the art on ANLI R3, ARC (Easy), and StoryCloze, suggesting that GPT-3's latent "understanding" has been previously underestimated.
83

Numerical Reasoning in NLP: Challenges, Innovations, and Strategies for Handling Mathematical Equivalency / 自然言語処理における数値推論:数学的同等性の課題、革新、および対処戦略

Liu, Qianying 25 September 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24929号 / 情博第840号 / 新制||情||140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)特定教授 黒橋 禎夫, 教授 河原 達也, 教授 西野 恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
84

Studies on Question Answering in Open-Book and Closed-Book Settings / オープンブックおよびクローズドブック設定における質問応答に関する研究

Alkhaldi, Tareq Yaser Samih 25 September 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24930号 / 情博第841号 / 新制||情||141(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)特定教授 黒橋 禎夫, 教授 河原 達也, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
85

Question Answering auf dem Lehrbuch 'Health Information Systems' mit Hilfe von unüberwachtem Training eines Pretrained Transformers

Keller, Paul 27 November 2023 (has links)
Die Extraktion von Wissen aus Büchern ist essentiell und komplex. Besonders in der Medizininformatik ist ein einfacher und vollständiger Zugang zu Wissen wichtig. In dieser Arbeit wurde ein vortrainiertes Sprachmodell verwendet, um den Inhalt des Buches Health Information Systems von Winter u. a. (2023) effizienter und einfacher zugänglich zu machen. Während des Trainings wurde die Qualität des Modells zu verschiedenen Zeitpunkten evaluiert. Dazu beantwortete das Modell Prüfungsfragen aus dem Buch und aus Modulen der Universität Leipzig, die inhaltlich auf dem Buch aufbauen. Abschließend wurde ein Vergleich zwischen den Trainingszeitpunkten, dem nicht weiter trainierten Modell und dem Stand der Technik Modell GPT4 durchgeführt. Mit einem MakroF1-Wert von 0,7 erreichte das Modell GPT4 die höchste Korrektheit bei der Beantwortung der Klausurfragen. Diese Leistung konnte von den anderen Modellen nicht erreicht werden. Allerdings stieg die Leistung von einem anfänglichen MakroF1-Wert von 0,13 durch kontinuierliches Training auf 0,33. Die Ergebnisse zeigen eine deutliche Leistungssteigerung durch diesen Ansatz und bieten eine Grundlage für zukünftige Erweiterungen. Damit ist die Machbarkeit der Beantwortung von Fragen zu Informationssystemen im Gesundheitswesen und der Lösung einer Beispielklausur mit Hilfe von weiter trainierten Sprachmodellen gezeigt, eine praktische Anwendung erreichen diese Modelle jedoch nicht, da sowohl die Leistung unter dem aktuellen Stand der Technik liegt als auch die hier vorgestellten Modelle einen Großteil der gestellten Fragen nicht vollständig korrekt beantworten können.:1 Einleitung 1.1 Gegenstand 1.2 Problemstellung 1.3 Motivation 1.4 Zielsetzung 1.5 Bezug zu ethischen Leitlinien der GMDS 1.6 Aufgabenstellung 1.7 Aufbau der Arbeit 2 Grundlagen 9 2.1 Sprachmodelle 2.1.1 Transformer-Modelle 2.1.2 Transformer-spezifische Architekturen 2.1.3 Eigenheiten von Transformer-Modellen 2.1.4 Eingaben von Transformer-Modellen 2.2 Neuronale Netze 2.2.1 Architektur 2.2.2 Funktionsweise 2.2.3 Training 2.3 Datenverarbeitung 2.3.1 Glossar der Daten 3 Stand der Forschung 3.1 Continual Pretraining 3.2 Aktuelle Modelle und deren Nutzbarkeit 3.3 Forschung und Probleme von Modellen 4 Lösungsansatz 4.1 Auswahl von Sprachmodellen 4.2 Datenkuration 4.2.1 Extraktion des Textes 4.2.2 Unverständliche Formate 4.2.3 Textpassagen ohne Wissen oder Kontext 4.2.4 Optionale Textentfernungen 4.2.5 Bleibende Texte 4.2.6 Formatierung von Text 4.2.7 Potentielle Extraktion von Fragen 4.3 Unüberwachtes Weitertrainieren 4.3.1 Ausführen der Training-Programme 4.4 Klausurfragen 4.5 Modellevaluation 5 Ausführung der Lösung 5.1 Herunterladen des Modells 5.2 Training des Modells 5.2.1 Konfiguration des Modells 5.2.2 Konfiguration der Trainingsdaten 5.2.3 Konfiguration des Trainings 5.2.4 Konfiguration des DeepSpeed Trainings 5.2.5 Verwendete Bibliotheken zum Training 5.2.6 Training auf einem GPU Computing Cluster 5.2.7 Probleme während des Trainings 5.3 Generierung von Antworten 5.3.1 Erstellung des Evaluierungsdatensatzes 5.4 Bewertung der generierten Antworten 5.5 Evaluation der Modelle 5.5.1 Kriterium: Korrektheit 5.5.2 Kriterium: Erklärbarkeit 5.5.3 Kriterium: Fragenverständnis 5.5.4 Kriterium: Robustheit 6 Ergebnisse 6.1 Analyse Korrektheit 6.1.1 Vergleich totaler Zahlen 6.1.2 Stärken und Schwächen der Modelle 6.1.3 Verbesserungen durch Training 6.1.4 Vergleich MakroF1 6.1.5 Zusammenfassung 6.2 Analyse Erklärbarkeit 6.3 Analyse Fragenverständnis 6.4 Analyse Robustheit 6.5 Zusammenfassung 7 Diskussion 7.1 Grenzen der Modelle 7.2 Probleme bei Kernfragen 7.3 Bewertung der Fragen mit Prüfungspunkten 7.4 Lösung des Problems 8 Ausblick 8.1 Modellvergrößerung 8.1.1 Training durch Quantisierung 8.2 Human Reinforcement Learning 8.3 Datensatzvergrößerung 8.4 Domänenspezifische Modelle 8.5 Adapter-basiertes Training 8.6 Textextraktion aus Kontext 8.7 Retrieval Augmented Generation 8.8 Zusammenfassung Zusammenfassung
86

Grounded and Consistent Question Answering

Alberti, Christopher Brian January 2023 (has links)
This thesis describes advancements in question answering along three general directions: model architecture extensions, explainable question answering, and data augmentation. Chapter 2 describes the first state-of-the-art model for the Natural Questions dataset based on pretrained transformers. Chapters 3 and 4 describe extensions to the model architecture designed to accommodate long textual inputs and multimodal text+image inputs, establishing new state-of-the-art results on the Natural Questions and on the VCR dataset. Chapter 5 shows that significant improvements can be obtained with data augmentation on the SQuAD and Natural Questions dataset, introducing roundtrip consistency as a simple heuristic to improve the quality of synthetic data. In Chapters 6 and 7 we explore explainable question answering, demonstrating the usefulness of a new concrete kind of structured explanations, QED, and proposing a semantic analysis of why-questions in the Natural Questions, as a way of better understanding the nature of real world explanations. Finally, in Chapters 8 and 9 we delve into more exploratory data augmentation techniques for question answering. We look respectively at how straight-through gradients can be utilized to optimize roundtrip consistency in a pipeline of models on the fly, and at how very recent large language models like PaLM can be used to generate synthetic question answering datasets for new languages given as few as five representative examples per language.
87

Transfer Learning and Attention Mechanisms in a Multimodal Setting

Greco, Claudio 13 May 2022 (has links)
Humans are able to develop a solid knowledge of the world around them: they can leverage information coming from different sources (e.g., language, vision), focus on the most relevant information from the input they receive in a given life situation, and exploit what they have learned before without forgetting it. In the field of Artificial Intelligence and Computational Linguistics, replicating these human abilities in artificial models is a major challenge. Recently, models based on pre-training and on attention mechanisms, namely pre-trained multimodal Transformers, have been developed. They seem to perform tasks surprisingly well compared to other computational models in multiple contexts. They simulate a human-like cognition in that they supposedly rely on previously acquired knowledge (transfer learning) and focus on the most important information (attention mechanisms) of the input. Nevertheless, we still do not know whether these models can deal with multimodal tasks that require merging different types of information simultaneously to be solved, as humans would do. This thesis attempts to fill this crucial gap in our knowledge of multimodal models by investigating the ability of pre-trained Transformers to encode multimodal information; and the ability of attention-based models to remember how to deal with previously-solved tasks. With regards to pre-trained Transformers, we focused on their ability to rely on pre-training and on attention while dealing with tasks requiring to merge information coming from language and vision. More precisely, we investigate if pre-trained multimodal Transformers are able to understand the internal structure of a dialogue (e.g., organization of the turns); to effectively solve complex spatial questions requiring to process different spatial elements (e.g., regions of the image, proximity between elements, etc.); and to make predictions based on complementary multimodal cues (e.g., guessing the most plausible action by leveraging the content of a sentence and of an image). The results of this thesis indicate that pre-trained Transformers outperform other models. Indeed, they are able to some extent to integrate complementary multimodal information; they manage to pinpoint both the relevant turns in a dialogue and the most important regions in an image. These results suggest that pre-training and attention play a key role in pre-trained Transformers’ encoding. Nevertheless, their way of processing information cannot be considered as human-like. Indeed, when compared to humans, they struggle (as non-pre-trained models do) to understand negative answers, to merge spatial information in difficult questions, and to predict actions based on complementary linguistic and visual cues. With regards to attention-based models, we found out that these kinds of models tend to forget what they have learned in previously-solved tasks. However, training these models on easy tasks before more complex ones seems to mitigate this catastrophic forgetting phenomenon. These results indicate that, at least in this context, attention-based models (and, supposedly, pre-trained Transformers too) are sensitive to tasks’ order. A better control of this variable may therefore help multimodal models learn sequentially and continuously as humans do.
88

Identifying reputation collectors in community question answering (CQA) sites: Exploring the dark side of social media

Roy, P.K., Singh, J.P., Baabdullah, A.M., Kizgin, Hatice, Rana, Nripendra P. 08 August 2019 (has links)
Yes / This research aims to identify users who are posting as well as encouraging others to post low-quality and duplicate contents on community question answering sites. The good guys called Caretakers and the bad guys called Reputation Collectors are characterised by their behaviour, answering pattern and reputation points. The proposed system is developed and analysed over publicly available Stack Exchange data dump. A graph based methodology is employed to derive the characteristic of Reputation Collectors and Caretakers. Results reveal that Reputation Collectors are primary sources of low-quality answers as well as answers to duplicate questions posted on the site. The Caretakers answer limited questions of challenging nature and fetches maximum reputation against those questions whereas Reputation Collectors answers have so many low-quality and duplicate questions to gain the reputation point. We have developed algorithms to identify the Caretakers and Reputation Collectors of the site. Our analysis finds that 1.05% of Reputation Collectors post 18.88% of low quality answers. This study extends previous research by identifying the Reputation Collectors and 2 how they collect their reputation points.
89

Automatic Question Answering and Knowledge Discovery from Electronic Health Records

Wang, Ping 25 August 2021 (has links)
Electronic Health Records (EHR) data contain comprehensive longitudinal patient information, which is usually stored in databases in the form of either multi-relational structured tables or unstructured texts, e.g., clinical notes. EHR provides a useful resource to assist doctors' decision making, however, they also present many unique challenges that limit the efficient use of the valuable information, such as large data volume, heterogeneous and dynamic information, medical term abbreviations, and noisy nature caused by misspelled words. This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to seek answers from EHR for clinical activity related questions posed in human language without the assistance of database and natural language processing (NLP) domain experts, (2) How to discover underlying relationships of different events and entities in structured tabular EHRs, and (3) How to predict when a medical event will occur and estimate its probability based on previous medical information of patients. First, to automatically retrieve answers for natural language questions from the structured tables in EHR, we study the question-to-SQL generation task by generating the corresponding SQL query of the input question. We propose a translation-edit model driven by a language generation module and an editing module for the SQL query generation task. This model helps automatically translate clinical activity related questions to SQL queries, so that the doctors only need to provide their questions in natural language to get the answers they need. We also create a large-scale dataset for question answering on tabular EHR to simulate a more realistic setting. Our performance evaluation shows that the proposed model is effective in handling the unique challenges about clinical terminologies, such as abbreviations and misspelled words. Second, to automatically identify answers for natural language questions from unstructured clinical notes in EHR, we propose to achieve this goal by querying a knowledge base constructed based on fine-grained document-level expert annotations of clinical records for various NLP tasks. We first create a dataset for clinical knowledge base question answering with two sets: clinical knowledge base and question-answer pairs. An attention-based aspect-level reasoning model is developed and evaluated on the new dataset. Our experimental analysis shows that it is effective in identifying answers and also allows us to analyze the impact of different answer aspects in predicting correct answers. Third, we focus on discovering underlying relationships of different entities (e.g., patient, disease, medication, and treatment) in tabular EHR, which can be formulated as a link prediction problem in graph domain. We develop a self-supervised learning framework for better representation learning of entities across a large corpus and also consider local contextual information for the down-stream link prediction task. We demonstrate the effectiveness, interpretability, and scalability of the proposed model on the healthcare network built from tabular EHR. It is also successfully applied to solve link prediction problems in a variety of domains, such as e-commerce, social networks, and academic networks. Finally, to dynamically predict the occurrence of multiple correlated medical events, we formulate the problem as a temporal (multiple time-points) and multi-task learning problem using tensor representation. We propose an algorithm to jointly and dynamically predict several survival problems at each time point and optimize it with the Alternating Direction Methods of Multipliers (ADMM) algorithm. The model allows us to consider both the dependencies between different tasks and the correlations of each task at different time points. We evaluate the proposed model on two real-world applications and demonstrate its effectiveness and interpretability. / Doctor of Philosophy / Healthcare is an important part of our lives. Due to the recent advances of data collection and storing techniques, a large amount of medical information is generated and stored in Electronic Health Records (EHR). By comprehensively documenting the longitudinal medical history information about a large patient cohort, this EHR data forms a fundamental resource in assisting doctors' decision making including optimization of treatments for patients and selection of patients for clinical trials. However, EHR data also presents a number of unique challenges, such as (i) large-scale and dynamic data, (ii) heterogeneity of medical information, and (iii) medical term abbreviation. It is difficult for doctors to effectively utilize such complex data collected in a typical clinical practice. Therefore, it is imperative to develop advanced methods that are helpful for efficient use of EHR and further benefit doctors in their clinical decision making. This dissertation focuses on automatically retrieving useful medical information, analyzing complex relationships of medical entities, and detecting future medical outcomes from EHR data. In order to retrieve information from EHR efficiently, we develop deep learning based algorithms that can automatically answer various clinical questions on structured and unstructured EHR data. These algorithms can help us understand more about the challenges in retrieving information from different data types in EHR. We also build a clinical knowledge graph based on EHR and link the distributed medical information and further perform the link prediction task, which allows us to analyze the complex underlying relationships of various medical entities. In addition, we propose a temporal multi-task survival analysis method to dynamically predict multiple medical events at the same time and identify the most important factors leading to the future medical events. By handling these unique challenges in EHR and developing suitable approaches, we hope to improve the efficiency of information retrieval and predictive modeling in healthcare.
90

Building a Trustworthy Question Answering System for Covid-19 Tracking

Liu, Yiqing 02 September 2021 (has links)
During the unprecedented global pandemic of Covid-19, the general public is suffering from inaccurate Covid-19 related information including outdated information and fake news. The most used media: TV, social media, newspaper, and radio are incompetent in providing certitude and flash updates that people are seeking. In order to cope with this challenge, several public data resources that are dedicated to providing Covid-19 information were born. They rallied with experts from different fields to provide authoritative and up-to-date pandemic updates. However, the general public cannot still make complete use of such resources since the learning curve is too steep, especially for the aged and under-aged users. To address this problem, in this Thesis, we propose a question answering system that can be interacted with using simple natural language-based sentences. While building this system, we investigate qualified public data resources and from the data content they are providing, and we collect a set of frequently asked questions for Covid-19 tracking. We further build a dedicated dataset named CovidQA for evaluating the performance of the question answering system with different models. Based on the new dataset, we assess multiple machine learning-based models that are built for retrieving relevant information from databases, and then propose two empirical models which utilize the pre-defined templates to generate SQL queries. In our experiments, we demonstrate both quantitative and qualitative results and provide a comprehensive comparison between different types of methods. The results show that the proposed template-based methods are simple but effective in building question answering systems for specific domain problems. / Master of Science / During the unprecedented global pandemic of Covid-19, the general public is suffering from inaccurate Covid-19 related information including outdated information and fake news. The most used media: TV, social media, newspaper, and radio are incompetent in providing certitude and flash updates that people are seeking. In order to cope with this challenge, several public data resources that are dedicated to providing Covid-19 information were born. They rallied with experts from different fields to provide authoritative and up-to-date pandemic updates. However, there is room for improvement in terms of user experience. To address this problem, in this Thesis, we propose a system that can be interacted with using natural questions. While building this system, we evaluate and choose six qualified public data providers as the data sources. We further build a testing dataset for evaluating the performance of the system. We assess two Artificial Intelligence-powered models for the system, and then propose two rule-based models for the researched problem. In our experiments, we provide a comprehensive comparison between different types of methods. The results show that the proposed rule-based methods are simple but effective in building such systems.

Page generated in 0.1205 seconds