• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 11
  • 10
  • 9
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 213
  • 157
  • 79
  • 64
  • 56
  • 55
  • 49
  • 40
  • 40
  • 39
  • 39
  • 33
  • 28
  • 26
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

A Study on Effective Approaches for Exploiting Temporal Information in News Archives / ニュースアーカイブの時制情報活用のための有効な手法に関する研究

Wang, Jiexin 26 September 2022 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24259号 / 情博第803号 / 新制||情||135(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 吉川 正俊, 教授 田島 敬史, 教授 黒橋 禎夫, 特定准教授 LIN Donghui / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
152

Investigating the Effect of Complementary Information Stored in Multiple Languages on Question Answering Performance : A Study of the Multilingual-T5 for Extractive Question Answering / Vad är effekten av kompletterande information lagrad i flera språk på frågebesvaring : En undersökning av multilingual-T5 för frågebesvaring

Aurell Hansson, Björn January 2021 (has links)
Extractive question answering is a popular domain in the field of natural language processing, where machine learning models are tasked with answering questions given a context. Historically the field has been centered on monolingual models, but recently more and more multilingual models have been developed, such as Google’s MT5 [1]. Because of this, machine translations of English have been used when training and evaluating these models, but machine translations can be degraded and do not always reflect their target language fairly. This report investigates if complementary information stored in other languages can improve monolingual QA performance for languages where only machine translations are available. It also investigates if exposure to more languages can improve zero-shot cross-lingual QA performance (i.e. when the question and answer do not have matching languages) by providing complementary information. We fine-tune 3 different MT5 models on QA datasets consisting of machine translations, as well as one model on the datasets together in combination with 3 other datasets that are not translations. We then evaluate the different models on the MLQA and XQuAD datasets. The results show that for 2 out of the 3 languages evaluated, complementary information stored in other languages had a positive effect on the QA performance of the MT5. For zero-shot cross-lingual QA, the complementary information offered by the fused model lead to improved performance compared to 2/3 of the MT5 models trained only on translated data, indicating that complementary information from other languages do not offer any improvement in this regard. / Frågebesvaring (QA) är en populär domän inom naturlig språkbehandling, där maskininlärningsmodeller har till uppgift att svara på frågor. Historiskt har fältet varit inriktat på enspråkiga modeller, men nyligen har fler och fler flerspråkiga modeller utvecklats, till exempel Googles MT5 [1]. På grund av detta har maskinöversättningar av engelska använts vid träning och utvärdering av dessa modeller, men maskinöversättningar kan vara försämrade och speglar inte alltid deras målspråk rättvist. Denna rapport undersöker om kompletterande information som lagras i andra språk kan förbättra enspråkig QA-prestanda för språk där endast maskinöversättningar är tillgängliga. Den undersöker också om exponering för fler språk kan förbättra QA-prestanda på zero-shot cross-lingual QA (dvs. där frågan och svaret inte har matchande språk) genom att tillhandahålla kompletterande information. Vi finjusterar 3 olika modeller på QA-datamängder som består av maskinöversättningar, samt en modell på datamängderna tillsammans i kombination med 3 andra datamängder som inte är översättningar. Vi utvärderar sedan de olika modellerna på MLQA- och XQuAD-datauppsättningarna. Resultaten visar att för 2 av de 3 utvärderade språken hade kompletterande information som lagrats i andra språk en positiv effekt på QA-prestanda. För zero-shot cross-lingual QA leder den kompletterande informationen som erbjuds av den sammansmälta modellen till förbättrad prestanda jämfört med 2/3 av modellerna som tränats endast på översättningar, vilket indikerar att kompletterande information från andra språk inte ger någon förbättring i detta avseende.
153

Retrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matching

Gabbay, Igal 01 1900 (has links)
While an information retrieval system takes as input a user query and returns a list of relevant documents chosen from a large collection, a question answering system attempts to produce an exact answer. Recent research, motivated by the question answering track of the Text REtrieval Conference (TREC) has focused mainly on answering ‘factoid’ questions concerned with names, places, dates etc. in the news domain. However, questions seeking definitions of terms are common in the logs of search engines. The objective of this project was therefore to investigate methods of retrieving definitions from scientific documents. The subject domain was salmon, and an appropriate test collection of articles was created, pre-processed and indexed. Relevant terms were obtained from salmon researchers and a fish database. A system was built which accepted a term as input, retrieved relevant documents from the collection using a search engine, identified definition phrases within them using a vocabulary of syntactic patterns and associated heuristics, and produced as output phrases explaining the term. Four experiments were carried out which progressively extended and refined the patterns. The performance of the system, measured using an appropriate form of precision, improved over the experiments from 8.6% to 63.6%. The main findings of the research were: (1) Definitions were diverse despite the documents’ homogeneity and found not only in the Introduction and Abstract sections but also in the Methods and References; (2) Nevertheless, syntactic patterns were a useful starting point in extracting them; (3) Three patterns accounted for 90% of candidate phrases; (4) Statistically, the ordinal number of the instance of the term in a document was a better indicator of the presence of a definition than either sentence position and length, or the number of sentences in the document. Next steps include classifying terms, using information extraction-like templates, resolving basic anaphors, ranking answers, exploiting the structure of scientific papers, and refining the evaluation process.
154

Ontology-Mediated Query Answering over Log-Linear Probabilistic Data: Extended Version

Borgwardt, Stefan, Ceylan, Ismail Ilkan, Lukasiewicz, Thomas 28 December 2023 (has links)
Large-scale knowledge bases are at the heart of modern information systems. Their knowledge is inherently uncertain, and hence they are often materialized as probabilistic databases. However, probabilistic database management systems typically lack the capability to incorporate implicit background knowledge and, consequently, fail to capture some intuitive query answers. Ontology-mediated query answering is a popular paradigm for encoding commonsense knowledge, which can provide more complete answers to user queries. We propose a new data model that integrates the paradigm of ontology-mediated query answering with probabilistic databases, employing a log-linear probability model. We compare our approach to existing proposals, and provide supporting computational results.
155

Hybridní hluboké metody pro automatické odpovídání na otázky / Hybrid Deep Question Answering

Aghaebrahimian, Ahmad January 2019 (has links)
Title: Hybrid Deep Question Answering Author: Ahmad Aghaebrahimian Institute: Institute of Formal and Applied Linguistics Supervisor: RNDr. Martin Holub, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: As one of the oldest tasks of Natural Language Processing, Question Answering is one of the most exciting and challenging research areas with lots of scientific and commercial applications. Question Answering as a discipline in the conjunction of computer science, statistics, linguistics, and cognitive science is concerned with building systems that automatically retrieve answers to ques- tions posed by humans in a natural language. This doctoral dissertation presents the author's research carried out in this discipline. It highlights his studies and research toward a hybrid Question Answering system consisting of two engines for Question Answering over structured and unstructured data. The structured engine comprises a state-of-the-art Question Answering system based on knowl- edge graphs. The unstructured engine consists of a state-of-the-art sentence-level Question Answering system and a word-level Question Answering system with results near to human performance. This work introduces a new Question An- swering dataset for answering word- and sentence-level questions as well. Start- ing from a...
156

Surmize: An Online NLP System for Close-Domain Question-Answering and Summarization

Bergkvist, Alexander, Hedberg, Nils, Rollino, Sebastian, Sagen, Markus January 2020 (has links)
The amount of data available and consumed by people globally is growing. To reduce mental fatigue and increase the general ability to gain insight into complex texts or documents, we have developed an application to aid in this task. The application allows users to upload documents and ask domain-specific questions about them using our web application. A summarized version of each document is presented to the user, which could further facilitate their understanding of the document and guide them towards what types of questions could be relevant to ask. Our application allows users flexibility with the types of documents that can be processed, it is publicly available, stores no user data, and uses state-of-the-art models for its summaries and answers. The result is an application that yields near human-level intuition for answering questions in certain isolated cases, such as Wikipedia and news articles, as well as some scientific texts. The application shows a decrease in reliability and its prediction as to the complexity of the subject, the number of words in the document, and grammatical inconsistency in the questions increases. These are all aspects that can be improved further if used in production. / Mängden data som är tillgänglig och konsumeras av människor växer globalt. För att minska den mentala trötthet och öka den allmänna förmågan att få insikt i komplexa, massiva texter eller dokument, har vi utvecklat en applikation för att bistå i de uppgifterna. Applikationen tillåter användare att ladda upp dokument och fråga kontextspecifika frågor via vår webbapplikation. En sammanfattad version av varje dokument presenteras till användaren, vilket kan ytterligare förenkla förståelsen av ett dokument och vägleda dem mot vad som kan vara relevanta frågor att ställa. Vår applikation ger användare möjligheten att behandla olika typer av dokument, är tillgänglig för alla, sparar ingen personlig data, och använder de senaste modellerna inom språkbehandling för dess sammanfattningar och svar. Resultatet är en applikation som når en nära mänsklig intuition för vissa domäner och frågor, som exempelvis Wikipedia- och nyhetsartiklar, samt viss vetensaplig text. Noterade undantag för tillämpningen härrör från ämnets komplexitet, grammatiska korrekthet för frågorna och dokumentets längd. Dessa är områden som kan förbättras ytterligare om den används i produktionen.
157

Developing an enriched natural language grammar for prosodically-improved concent-to-speech synthesis

Marais, Laurette 04 1900 (has links)
The need for interacting with machines using spoken natural language is growing, along with the expectation that synthetic speech in this context sound natural. Such interaction includes answering questions, where prosody plays an important role in producing natural English synthetic speech by communicating the information structure of utterances. CCG is a theoretical framework that exploits the notion that, in English, information structure, prosodic structure and syntactic structure are isomorphic. This provides a way to convert a semantic representation of an utterance into a prosodically natural spoken utterance. GF is a framework for writing grammars, where abstract tree structures capture the semantic structure and concrete grammars render these structures in linearised strings. This research combines these frameworks to develop a system that converts semantic representations of utterances into linearised strings of natural language that are marked up to inform the prosody-generating component of a speech synthesis system. / Computing / M. Sc. (Computing)
158

Answer Triggering Mechanisms in Neural Reading Comprehension-based Question Answering Systems

Trembczyk, Max January 2019 (has links)
We implement a state-of-the-art question answering system based on Convolutional Neural Networks and Attention Mechanisms and include four different variants of answer triggering that have been discussed in recent literature. The mechanisms are included in different places in the architecture and work with different information and mechanisms. We train, develop and test our models on the popular SQuAD data set for Question Answering based on Reading Comprehension that has in its latest version been equipped with additional non-answerable questions that have to be retrieved by the systems. We test the models against baselines and against each other and provide an extensive evaluation both in a general question answering task and in the explicit performance of the answer triggering mechanisms. We show that the answer triggering mechanisms all clearly improve the model over the baseline without answer triggering by as much as 19.6% to 31.3% depending on the model and the metric. The best performance in general question answering shows a model that we call Candidate:No, that treats the possibility that no answer can be found in the document as just another answer candidate instead of having an additional decision step at some place in the model's architecture as in the other three mechanisms. The performance on detecting the non-answerable questions is very similar in three of the four mechanisms, while one performs notably worse. We give suggestions which approach to use when a more or less conservative approach is desired, and discuss suggestions for future developments.
159

Recherche de réponses précises à des questions médicales : le système de questions-réponses MEANS / Finding precise answers to medical questions : the question-answering system MEANS

Ben Abacha, Asma 28 June 2012 (has links)
La recherche de réponses précises à des questions formulées en langue naturelle renouvelle le champ de la recherche d’information. De nombreux travaux ont eu lieu sur la recherche de réponses à des questions factuelles en domaine ouvert. Moins de travaux ont porté sur la recherche de réponses en domaine de spécialité, en particulier dans le domaine médical ou biomédical. Plusieurs conditions différentes sont rencontrées en domaine de spécialité comme les lexiques et terminologies spécialisés, les types particuliers de questions, entités et relations du domaine ou les caractéristiques des documents ciblés. Dans une première partie, nous étudions les méthodes permettant d’analyser sémantiquement les questions posées par l’utilisateur ainsi que les textes utilisés pour trouver les réponses. Pour ce faire nous utilisons des méthodes hybrides pour deux tâches principales : (i) la reconnaissance des entités médicales et (ii) l’extraction de relations sémantiques. Ces méthodes combinent des règles et patrons construits manuellement, des connaissances du domaine et des techniques d’apprentissage statistique utilisant différents classifieurs. Ces méthodes hybrides, expérimentées sur différents corpus, permettent de pallier les inconvénients des deux types de méthodes d’extraction d’information, à savoir le manque de couverture potentiel des méthodes à base de règles et la dépendance aux données annotées des méthodes statistiques. Dans une seconde partie, nous étudions l’apport des technologies du web sémantique pour la portabilité et l’expressivité des systèmes de questions-réponses. Dans le cadre de notre approche, nous exploitons les technologies du web sémantique pour annoter les informations extraites en premier lieu et pour interroger sémantiquement ces annotations en second lieu. Enfin, nous présentons notre système de questions-réponses, appelé MEANS, qui utilise à la fois des techniques de TAL, des connaissances du domaine et les technologies du web sémantique pour répondre automatiquement aux questions médicales. / With the dramatic growth of digital information, finding precise answers to natural language questions is more and more essential for retrieving domain knowledge in real time. Many research works tackled answer retrieval for factual questions in open domain. Less works were performed for domain-specific question answering such as the medical domain. Compared to the open domain, several different conditions are met in the medical domain such as specialized vocabularies, specific types of questions, different kinds of domain entities and relations. Document characteristics are also a matter of importance, as, for example, clinical texts may tend to use a lot of technical abbreviations while forum pages may use long “approximate” terms. We focus on finding precise answers to natural language questions in the medical field. A key process for this task is to analyze the questions and the source documents semantically and to use standard formalisms to represent the obtained annotations. We propose a medical question-answering approach based on: (i) NLP methods combing domain knowledge, rule-based methods and statistical ones to extract relevant information from questions and documents and (ii) Semantic Web technologies to represent and interrogate the extracted information.
160

Efficient query answering in peer data management systems

Roth, Armin 12 March 2012 (has links)
Peer-Daten-Management-Systeme (PDMS) bestehen aus einer hochdynamischen Menge heterogener, autonomer Peers. Die Peers beantworten Anfragen einerseits gegen lokal gespeicherte Daten und reichen sie andererseits nach einer Umschreibung anhand von Schema-Mappings an benachbarte Peers weiter. Solche aufgrund fehlender zentraler Komponenten eigentlich hoch- flexiblen Systeme leiden bei zunehmender Anzahl von Peers unter erheblichen Effi- zienzproblemen. Die Gründe hierfür liegen in der massiven Redundanz der Pfade im Netzwerk der Peers und im Informationsverlust aufgrund von Projektionen entlang von Mapping-Pfaden. Anwender akzeptieren in hochskalierten Umgebungen zum Datenaustausch in vielen Anwendungsszenarien Konzessionen an die Vollständigkeit der Anfrageergebnisse. Unser Ansatz sieht in der Vollständigkeit ein Optimierungsziel und verfolgt einen Kompromiß zwischen Nutzen und Kosten der Anfragebearbeitung. Hierzu schlagen wir mehrere Strategien für Peers vor, um zu entscheiden, an welche Nachbar-Peers Anfragen weitergeleitet werden. Peers schließen dabei Mappings von der Anfragebearbeitung aus, von denen sie ein geringes Verhältnis von Ergebnisgröße zu Kosten, also geringe Effizienz erwarten. Als Basis dieser Schätzungen wenden wir selbstadaptive Histogramme über die Ergebniskardinalität an und weisen nach, daß diese in dieser hochdynamischen Umgebung ausreichende Genauigkeit aufweisen. Wir schlagen einen Kompromiß zwischen der Nutzung von Anfrageergebnissen zur Anpassung dieser Metadaten-Statistiken und der Beschneidung von Anfrageplänen vor, um den entsprechenden Zielkonflikt aufzulösen. Für eine Optimierungsstrategie, die das für die Anfragebearbeitung verwendete Zeit-Budget limitiert, untersuchen wir mehrere Varianten hinsichtlich des Effizienzsteigerungspotentials. Darüber hinaus nutzen wir mehrdimensionale Histogramme über die Überlappung zweier Datenquellen zur gezielten Verminderung der Redundanz in der Anfragebearbeitung. / Peer data management systems (PDMS) consist of a highly dynamic set of autonomous, heterogeneous peers connected with schema mappings. Queries submitted at a peer are answered with data residing at that peer and by passing the queries to neighboring peers. PDMS are the most general architecture for distributed integrated information systems. With no need for central coordination, PDMS are highly flexible. However, due to the typical massive redundancy in mapping paths, PDMS tend to be very inefficient in computing the complete query result as the number of peers increases. Additionally, information loss is cumulated along mapping paths due to selections and projections in the mappings. Users usually accept concessions on the completeness of query answers in large-scale data sharing settings. Our approach turns completeness into an optimization goal and thus trades off benefit and cost of query answering. To this end, we propose several strategies that guide peers in their decision to which neighbors rewritten queries should be sent. In effect, the peers prune mappings that are expected to contribute few data. We propose a query optimization strategy that limits resource consumption and show that it can drastically increase efficiency while still yielding satisfying completeness of the query result. To estimate the potential data contribution of mappings, we adopted self-tuning histograms for cardinality estimation. We developed techniques that ensure sufficient query feedback to adapt these statistics to massive changes in a PDMS. Additionally, histograms can serve to maintain statistics on data overlap between alternative mapping paths. Building on them, redundant query processing is reduced by avoiding overlapping areas of the multi-dimensional data space.

Page generated in 0.0978 seconds