Global ETD Search

81	Low-resource Language Question Answering Systemwith BERT Jansson, Herman January 2021 (has links) The complexity for being at the forefront regarding information retrieval systems are constantly increasing. Recent technology of natural language processing called BERT has reached superhuman performance in high resource languages for reading comprehension tasks. However, several researchers has stated that multilingual model’s are not enough for low-resource languages, since they are lacking a thorough understanding of those languages. Recently, a Swedish pre-trained BERT model has been introduced which is trained on significantly more Swedish data than the multilingual models currently available. This study compares both multilingual and Swedish monolingual inherited BERT model’s for question answering utilizing both a English and a Swedish machine translated SQuADv2 data set during its fine-tuning process. The models are evaluated with SQuADv2 benchmark and within a implemented question answering system built upon the classical retriever-reader methodology. This study introduces a naive and more robust prediction method for the proposed question answering system as well finding a sweet spot for each individual model approach integrated into the system. The question answering system is evaluated and compared against another question answering library at the leading edge within the area, applying a custom crafted Swedish evaluation data set. The results show that the fine-tuned model based on the Swedish pre-trained model and the Swedish SQuADv2 data set were superior in all evaluation metrics except speed. The comparison between the different systems resulted in a higher evaluation score but a slower prediction time for this study’s system. BERT Question Answering system Reading Comprehension Low resource language SQuADv2 Computer Systems Datorsystem
82	Using Deep Learning to Answer Visual Questions from Blind People / Användning av Deep Learning för att Svara på Visuella Frågor från Blinda Dushi, Denis January 2019 (has links) A natural application of artificial intelligence is to help blind people overcome their daily visual challenges through AI-based assistive technologies. In this regard, one of the most promising tasks is Visual Question Answering (VQA): the model is presented with an image and a question about this image. It must then predict the correct answer. Recently has been introduced the VizWiz dataset, a collection of images and questions originating from blind people. Being the first VQA dataset deriving from a natural setting, VizWiz presents many limitations and peculiarities. More specifically, the characteristics observed are the high uncertainty of the answers, the conversational aspect of questions, the relatively small size of the datasets and ultimately, the imbalance between answerable and unanswerable classes. These characteristics could be observed, individually or jointly, in other VQA datasets, resulting in a burden when solving the VQA task. Particularly suitable to address these aspects of the data are data science pre-processing techniques. Therefore, to provide a solid contribution to the VQA task, we answered the research question “Can data science pre-processing techniques improve the VQA task?” by proposing and studying the effects of four different pre-processing techniques. To address the high uncertainty of answers we employed a pre-processing step in which it is computed the uncertainty of each answer and used this measure to weight the soft scores of our model during training. The adoption of an “uncertainty-aware” training procedure boosted the predictive accuracy of our model of 10% providing a new state-of-the-art when evaluated on the test split of the VizWiz dataset. In order to overcome the limited amount of data, we designed and tested a new pre-processing procedure able to augment the training set and almost double its data points by computing the cosine similarity between answers representation. We addressed also the conversational aspect of questions collected from real world verbal conversations by proposing an alternative question pre-processing pipeline in which conversational terms are removed. This led in a further improvement: from a predictive accuracy of 0.516 with the standard question processing pipeline, we were able to achieve 0.527 predictive accuracy when employing the new pre-processing pipeline. Ultimately, we addressed the imbalance between answerable and unanswerable classes when predicting the answerability of a visual question. We tested two standard pre-processing techniques to adjust the dataset class distribution: oversampling and undersampling. Oversampling provided an albeit small improvement in both average precision and F1 score. / En naturlig tillämpning av artificiell intelligens är att hjälpa blinda med deras dagliga visuella utmaningar genom AI-baserad hjälpmedelsteknik. I detta avseende, är en av de mest lovande uppgifterna Visual Question Answering (VQA): modellen presenteras med en bild och en fråga om denna bild, och måste sedan förutspå det korrekta svaret. Nyligen introducerades VizWiz-datamängd, en samling bilder och frågor till dessa från blinda personer. Då detta är det första VQA-datamängden som härstammar från en naturlig miljö, har det många begränsningar och särdrag. Mer specifikt är de observerade egenskaperna: hög osäkerhet i svaren, informell samtalston i frågorna, relativt liten datamängd och slutligen obalans mellan svarbara och icke svarbara klasser. Dessa egenskaper kan även observeras, enskilda eller tillsammans, i andra VQA-datamängd, vilket utgör särskilda utmaningar vid lösning av VQA-uppgiften. Särskilt lämplig för att hantera dessa aspekter av data är förbehandlingsteknik från området data science. För att bidra till VQA-uppgiften, svarade vi därför på frågan “Kan förbehandlingstekniker från området data science bidra till lösningen av VQA-uppgiften?” genom att föreslå och studera effekten av fyra olika förbehandlingstekniker. För att hantera den höga osäkerheten i svaren använde vi ett förbehandlingssteg där vi beräknade osäkerheten i varje svar och använde detta mått för att vikta modellens utdata-värden under träning. Användandet av en ”osäkerhetsmedveten” träningsprocedur förstärkte den förutsägbara noggrannheten hos vår modell med 10%. Med detta nådde vi ett toppresultat när modellen utvärderades på testdelen av VizWiz-datamängden. För att övervinna problemet med den begränsade mängden data, konstruerade och testade vi en ny förbehandlingsprocedur som nästan dubblerar datapunkterna genom att beräkna cosinuslikheten mellan svarens vektorer. Vi hanterade även problemet med den informella samtalstonen i frågorna, som samlats in från den verkliga världens verbala konversationer, genom att föreslå en alternativ väg att förbehandla frågorna, där samtalstermer är borttagna. Detta ledde till en ytterligare förbättring: från en förutsägbar noggrannhet på 0.516 med det vanliga sättet att bearbeta frågorna kunde vi uppnå 0.527 prediktiv noggrannhet vid användning av det nya sättet att förbehandla frågorna. Slutligen hanterade vi obalansen mellan svarbara och icke svarbara klasser genom att förutse om en visuell fråga har ett möjligt svar. Vi testade två standard-förbehandlingstekniker för att justeradatamängdens klassdistribution: översampling och undersampling. Översamplingen gav en om än liten förbättring i både genomsnittlig precision och F1-poäng. Computer and Information Sciences Data- och informationsvetenskap
83	Syntax-based Concept Extraction For Question Answering Glinos, Demetrios 01 January 2006 (has links) Question answering (QA) stands squarely along the path from document retrieval to text understanding. As an area of research interest, it serves as a proving ground where strategies for document processing, knowledge representation, question analysis, and answer extraction may be evaluated in real world information extraction contexts. The task is to go beyond the representation of text documents as "bags of words" or data blobs that can be scanned for keyword combinations and word collocations in the manner of internet search engines. Instead, the goal is to recognize and extract the semantic content of the text, and to organize it in a manner that supports reasoning about the concepts represented. The issue presented is how to obtain and query such a structure without either a predefined set of concepts or a predefined set of relationships among concepts. This research investigates a means for acquiring from text documents both the underlying concepts and their interrelationships. Specifically, a syntax-based formalism for representing atomic propositions that are extracted from text documents is presented, together with a method for constructing a network of concept nodes for indexing such logical forms based on the discourse entities they contain. It is shown that meaningful questions can be decomposed into Boolean combinations of question patterns using the same formalism, with free variables representing the desired answers. It is further shown that this formalism can be used for robust question answering using the concept network and WordNet synonym, hypernym, hyponym, and antonym relationships. This formalism was implemented in the Semantic Extractor (SEMEX) research tool and was tested against the factoid questions from the 2005 Text Retrieval Conference (TREC), which operated upon the AQUAINT corpus of newswire documents. After adjusting for the limitations of the tool and the document set, correct answers were found for approximately fifty percent of the questions analyzed, which compares favorably with other question answering systems. natural language processing concept extraction question answering knowledge acquisition knowledge representation concept network Computer Sciences Engineering
84	Leveraging Large Language Models Trained on Code for Symbol Binding Robinson, Joshua 09 August 2022 (has links) (PDF) While large language models like GPT-3 have achieved impressive results in the zero-, one-, and few-shot settings, they still significantly underperform on some tasks relative to the state of the art (SOTA). For many tasks it would be useful to have answer options explicitly listed out in a multiple choice format, decreasing computational cost and allowing the model to reason about the relative merits of possible answers. We argue that the reason this hasn't helped models like GPT-3 close the gap with the SOTA is that these models struggle with symbol binding - associating each answer option with a symbol that represents it. To ameliorate this situation we introduce index prompting, a way of leveraging language models trained on code to successfully answer multiple choice formatted questions. When used with the OpenAI Codex model, our method improves accuracy by about 18% on average in the few-shot setting relative to GPT-3 across 8 datasets representing 4 common NLP tasks. It also achieves a new single-model state of the art on ANLI R3, ARC (Easy), and StoryCloze, suggesting that GPT-3's latent "understanding" has been previously underestimated. Multiple choice question answering symbol binding language models NLP Physical Sciences and Mathematics
85	Numerical Reasoning in NLP: Challenges, Innovations, and Strategies for Handling Mathematical Equivalency / 自然言語処理における数値推論：数学的同等性の課題、革新、および対処戦略 Liu, Qianying 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24929号 / 情博第840号 / 新制\|\|情\|\|140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)特定教授黒橋禎夫, 教授河原達也, 教授西野恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Numerical Reasoning Solving Math Word Problems Question Answering Math Natural Language Processing Natural Language Understanding 007
86	Studies on Question Answering in Open-Book and Closed-Book Settings / オープンブックおよびクローズドブック設定における質問応答に関する研究 Alkhaldi, Tareq Yaser Samih 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24930号 / 情博第841号 / 新制\|\|情\|\|141(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)特定教授黒橋禎夫, 教授河原達也, 教授鹿島久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Question Answering Closed-book QA Open-book QA Multi-hop QA 007
87	Question Answering auf dem Lehrbuch 'Health Information Systems' mit Hilfe von unüberwachtem Training eines Pretrained Transformers Keller, Paul 27 November 2023 (has links) Die Extraktion von Wissen aus Büchern ist essentiell und komplex. Besonders in der Medizininformatik ist ein einfacher und vollständiger Zugang zu Wissen wichtig. In dieser Arbeit wurde ein vortrainiertes Sprachmodell verwendet, um den Inhalt des Buches Health Information Systems von Winter u. a. (2023) effizienter und einfacher zugänglich zu machen. Während des Trainings wurde die Qualität des Modells zu verschiedenen Zeitpunkten evaluiert. Dazu beantwortete das Modell Prüfungsfragen aus dem Buch und aus Modulen der Universität Leipzig, die inhaltlich auf dem Buch aufbauen. Abschließend wurde ein Vergleich zwischen den Trainingszeitpunkten, dem nicht weiter trainierten Modell und dem Stand der Technik Modell GPT4 durchgeführt. Mit einem MakroF1-Wert von 0,7 erreichte das Modell GPT4 die höchste Korrektheit bei der Beantwortung der Klausurfragen. Diese Leistung konnte von den anderen Modellen nicht erreicht werden. Allerdings stieg die Leistung von einem anfänglichen MakroF1-Wert von 0,13 durch kontinuierliches Training auf 0,33. Die Ergebnisse zeigen eine deutliche Leistungssteigerung durch diesen Ansatz und bieten eine Grundlage für zukünftige Erweiterungen. Damit ist die Machbarkeit der Beantwortung von Fragen zu Informationssystemen im Gesundheitswesen und der Lösung einer Beispielklausur mit Hilfe von weiter trainierten Sprachmodellen gezeigt, eine praktische Anwendung erreichen diese Modelle jedoch nicht, da sowohl die Leistung unter dem aktuellen Stand der Technik liegt als auch die hier vorgestellten Modelle einen Großteil der gestellten Fragen nicht vollständig korrekt beantworten können.:1 Einleitung 1.1 Gegenstand 1.2 Problemstellung 1.3 Motivation 1.4 Zielsetzung 1.5 Bezug zu ethischen Leitlinien der GMDS 1.6 Aufgabenstellung 1.7 Aufbau der Arbeit 2 Grundlagen 9 2.1 Sprachmodelle 2.1.1 Transformer-Modelle 2.1.2 Transformer-spezifische Architekturen 2.1.3 Eigenheiten von Transformer-Modellen 2.1.4 Eingaben von Transformer-Modellen 2.2 Neuronale Netze 2.2.1 Architektur 2.2.2 Funktionsweise 2.2.3 Training 2.3 Datenverarbeitung 2.3.1 Glossar der Daten 3 Stand der Forschung 3.1 Continual Pretraining 3.2 Aktuelle Modelle und deren Nutzbarkeit 3.3 Forschung und Probleme von Modellen 4 Lösungsansatz 4.1 Auswahl von Sprachmodellen 4.2 Datenkuration 4.2.1 Extraktion des Textes 4.2.2 Unverständliche Formate 4.2.3 Textpassagen ohne Wissen oder Kontext 4.2.4 Optionale Textentfernungen 4.2.5 Bleibende Texte 4.2.6 Formatierung von Text 4.2.7 Potentielle Extraktion von Fragen 4.3 Unüberwachtes Weitertrainieren 4.3.1 Ausführen der Training-Programme 4.4 Klausurfragen 4.5 Modellevaluation 5 Ausführung der Lösung 5.1 Herunterladen des Modells 5.2 Training des Modells 5.2.1 Konfiguration des Modells 5.2.2 Konfiguration der Trainingsdaten 5.2.3 Konfiguration des Trainings 5.2.4 Konfiguration des DeepSpeed Trainings 5.2.5 Verwendete Bibliotheken zum Training 5.2.6 Training auf einem GPU Computing Cluster 5.2.7 Probleme während des Trainings 5.3 Generierung von Antworten 5.3.1 Erstellung des Evaluierungsdatensatzes 5.4 Bewertung der generierten Antworten 5.5 Evaluation der Modelle 5.5.1 Kriterium: Korrektheit 5.5.2 Kriterium: Erklärbarkeit 5.5.3 Kriterium: Fragenverständnis 5.5.4 Kriterium: Robustheit 6 Ergebnisse 6.1 Analyse Korrektheit 6.1.1 Vergleich totaler Zahlen 6.1.2 Stärken und Schwächen der Modelle 6.1.3 Verbesserungen durch Training 6.1.4 Vergleich MakroF1 6.1.5 Zusammenfassung 6.2 Analyse Erklärbarkeit 6.3 Analyse Fragenverständnis 6.4 Analyse Robustheit 6.5 Zusammenfassung 7 Diskussion 7.1 Grenzen der Modelle 7.2 Probleme bei Kernfragen 7.3 Bewertung der Fragen mit Prüfungspunkten 7.4 Lösung des Problems 8 Ausblick 8.1 Modellvergrößerung 8.1.1 Training durch Quantisierung 8.2 Human Reinforcement Learning 8.3 Datensatzvergrößerung 8.4 Domänenspezifische Modelle 8.5 Adapter-basiertes Training 8.6 Textextraktion aus Kontext 8.7 Retrieval Augmented Generation 8.8 Zusammenfassung Zusammenfassung info:eu-repo/classification/ddc/004 ddc:004
88	Grounded and Consistent Question Answering Alberti, Christopher Brian January 2023 (has links) This thesis describes advancements in question answering along three general directions: model architecture extensions, explainable question answering, and data augmentation. Chapter 2 describes the first state-of-the-art model for the Natural Questions dataset based on pretrained transformers. Chapters 3 and 4 describe extensions to the model architecture designed to accommodate long textual inputs and multimodal text+image inputs, establishing new state-of-the-art results on the Natural Questions and on the VCR dataset. Chapter 5 shows that significant improvements can be obtained with data augmentation on the SQuAD and Natural Questions dataset, introducing roundtrip consistency as a simple heuristic to improve the quality of synthetic data. In Chapters 6 and 7 we explore explainable question answering, demonstrating the usefulness of a new concrete kind of structured explanations, QED, and proposing a semantic analysis of why-questions in the Natural Questions, as a way of better understanding the nature of real world explanations. Finally, in Chapters 8 and 9 we delve into more exploratory data augmentation techniques for question answering. We look respectively at how straight-through gradients can be utilized to optimize roundtrip consistency in a pipeline of models on the fly, and at how very recent large language models like PaLM can be used to generate synthetic question answering datasets for new languages given as few as five representative examples per language. Computer science Question-answering systems Computational linguistics
89	Transfer Learning and Attention Mechanisms in a Multimodal Setting Greco, Claudio 13 May 2022 (has links) Humans are able to develop a solid knowledge of the world around them: they can leverage information coming from different sources (e.g., language, vision), focus on the most relevant information from the input they receive in a given life situation, and exploit what they have learned before without forgetting it. In the field of Artificial Intelligence and Computational Linguistics, replicating these human abilities in artificial models is a major challenge. Recently, models based on pre-training and on attention mechanisms, namely pre-trained multimodal Transformers, have been developed. They seem to perform tasks surprisingly well compared to other computational models in multiple contexts. They simulate a human-like cognition in that they supposedly rely on previously acquired knowledge (transfer learning) and focus on the most important information (attention mechanisms) of the input. Nevertheless, we still do not know whether these models can deal with multimodal tasks that require merging different types of information simultaneously to be solved, as humans would do. This thesis attempts to fill this crucial gap in our knowledge of multimodal models by investigating the ability of pre-trained Transformers to encode multimodal information; and the ability of attention-based models to remember how to deal with previously-solved tasks. With regards to pre-trained Transformers, we focused on their ability to rely on pre-training and on attention while dealing with tasks requiring to merge information coming from language and vision. More precisely, we investigate if pre-trained multimodal Transformers are able to understand the internal structure of a dialogue (e.g., organization of the turns); to effectively solve complex spatial questions requiring to process different spatial elements (e.g., regions of the image, proximity between elements, etc.); and to make predictions based on complementary multimodal cues (e.g., guessing the most plausible action by leveraging the content of a sentence and of an image). The results of this thesis indicate that pre-trained Transformers outperform other models. Indeed, they are able to some extent to integrate complementary multimodal information; they manage to pinpoint both the relevant turns in a dialogue and the most important regions in an image. These results suggest that pre-training and attention play a key role in pre-trained Transformers’ encoding. Nevertheless, their way of processing information cannot be considered as human-like. Indeed, when compared to humans, they struggle (as non-pre-trained models do) to understand negative answers, to merge spatial information in difficult questions, and to predict actions based on complementary linguistic and visual cues. With regards to attention-based models, we found out that these kinds of models tend to forget what they have learned in previously-solved tasks. However, training these models on easy tasks before more complex ones seems to mitigate this catastrophic forgetting phenomenon. These results indicate that, at least in this context, attention-based models (and, supposedly, pre-trained Transformers too) are sensitive to tasks’ order. A better control of this variable may therefore help multimodal models learn sequentially and continuously as humans do. Settore INF/01 - Informatica
90	Automatic Question Answering and Knowledge Discovery from Electronic Health Records Wang, Ping 25 August 2021 (has links) Electronic Health Records (EHR) data contain comprehensive longitudinal patient information, which is usually stored in databases in the form of either multi-relational structured tables or unstructured texts, e.g., clinical notes. EHR provides a useful resource to assist doctors' decision making, however, they also present many unique challenges that limit the efficient use of the valuable information, such as large data volume, heterogeneous and dynamic information, medical term abbreviations, and noisy nature caused by misspelled words. This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to seek answers from EHR for clinical activity related questions posed in human language without the assistance of database and natural language processing (NLP) domain experts, (2) How to discover underlying relationships of different events and entities in structured tabular EHRs, and (3) How to predict when a medical event will occur and estimate its probability based on previous medical information of patients. First, to automatically retrieve answers for natural language questions from the structured tables in EHR, we study the question-to-SQL generation task by generating the corresponding SQL query of the input question. We propose a translation-edit model driven by a language generation module and an editing module for the SQL query generation task. This model helps automatically translate clinical activity related questions to SQL queries, so that the doctors only need to provide their questions in natural language to get the answers they need. We also create a large-scale dataset for question answering on tabular EHR to simulate a more realistic setting. Our performance evaluation shows that the proposed model is effective in handling the unique challenges about clinical terminologies, such as abbreviations and misspelled words. Second, to automatically identify answers for natural language questions from unstructured clinical notes in EHR, we propose to achieve this goal by querying a knowledge base constructed based on fine-grained document-level expert annotations of clinical records for various NLP tasks. We first create a dataset for clinical knowledge base question answering with two sets: clinical knowledge base and question-answer pairs. An attention-based aspect-level reasoning model is developed and evaluated on the new dataset. Our experimental analysis shows that it is effective in identifying answers and also allows us to analyze the impact of different answer aspects in predicting correct answers. Third, we focus on discovering underlying relationships of different entities (e.g., patient, disease, medication, and treatment) in tabular EHR, which can be formulated as a link prediction problem in graph domain. We develop a self-supervised learning framework for better representation learning of entities across a large corpus and also consider local contextual information for the down-stream link prediction task. We demonstrate the effectiveness, interpretability, and scalability of the proposed model on the healthcare network built from tabular EHR. It is also successfully applied to solve link prediction problems in a variety of domains, such as e-commerce, social networks, and academic networks. Finally, to dynamically predict the occurrence of multiple correlated medical events, we formulate the problem as a temporal (multiple time-points) and multi-task learning problem using tensor representation. We propose an algorithm to jointly and dynamically predict several survival problems at each time point and optimize it with the Alternating Direction Methods of Multipliers (ADMM) algorithm. The model allows us to consider both the dependencies between different tasks and the correlations of each task at different time points. We evaluate the proposed model on two real-world applications and demonstrate its effectiveness and interpretability. / Doctor of Philosophy / Healthcare is an important part of our lives. Due to the recent advances of data collection and storing techniques, a large amount of medical information is generated and stored in Electronic Health Records (EHR). By comprehensively documenting the longitudinal medical history information about a large patient cohort, this EHR data forms a fundamental resource in assisting doctors' decision making including optimization of treatments for patients and selection of patients for clinical trials. However, EHR data also presents a number of unique challenges, such as (i) large-scale and dynamic data, (ii) heterogeneity of medical information, and (iii) medical term abbreviation. It is difficult for doctors to effectively utilize such complex data collected in a typical clinical practice. Therefore, it is imperative to develop advanced methods that are helpful for efficient use of EHR and further benefit doctors in their clinical decision making. This dissertation focuses on automatically retrieving useful medical information, analyzing complex relationships of medical entities, and detecting future medical outcomes from EHR data. In order to retrieve information from EHR efficiently, we develop deep learning based algorithms that can automatically answer various clinical questions on structured and unstructured EHR data. These algorithms can help us understand more about the challenges in retrieving information from different data types in EHR. We also build a clinical knowledge graph based on EHR and link the distributed medical information and further perform the link prediction task, which allows us to analyze the complex underlying relationships of various medical entities. In addition, we propose a temporal multi-task survival analysis method to dynamically predict multiple medical events at the same time and identify the most important factors leading to the future medical events. By handling these unique challenges in EHR and developing suitable approaches, we hope to improve the efficiency of information retrieval and predictive modeling in healthcare. Electronic Health Records Question Answering Knowledge Discovery Knowledge Graph Survival Analysis

Search results