Global ETD Search

31	Using Gaze Tracking to Tackle Duplicate Questions on Community Based Question Answering Websites: A Case Study of Ifixit Gandhi, Pankti 01 June 2018 (has links) The number of unanswered questions on Community based Question Answering (CQA) websites has increased significantly due to the rising number of duplicate questions. This is a serious problem, one that could lead to the decline of such beneficial websites. This thesis presents novel avenues that use gaze tracking technology and behavioral testing to tackle this problem. Based on prior studies on web search behaviors, we assumed that adding contextual information (snippets) to proposed related questions displayed on the `Ask a Question' page of the CQA website iFixit would improve the asker experience and reduce their tendency to post a new duplicate question. The first lab experiment where this web page was redesigned and compared to the original one was conducted on 8 participants. Results confirmed that participants were more likely to find an answer to their question on the redesigned page. A second experiment, conducted remotely and on a larger sample of 74 participants, aimed to discover strategic attributes that increase the perceived similarity of question pairs. These attributes were used in the third lab experiment (20 participants) to redesign and assess the snippets from Experiment 1. Results indicated that snippets containing `symptom(s)' and `cause(s)' attributes constitute an incremental improvement over basic snippets: they are perceived as slightly more relevant and require significantly less gaze fixations on the asker's part. Community based Question Answering CQA Human Computer Interaction iFixit
32	Analyzing Answer Acceptance on Stack Overflow Using the Asker's Participation in Answer Comments Yiqun Zhang (16326174) 14 June 2023 (has links) <p> </p> <p>CQA platforms face problems, particularly inactive participants and low-quality content, that hurt long-term sustainability (Srba & Bielikova, 2016). Recent CQA studies have revealed the great value of answer comments in contributing to crowdsourced knowledge and investigating answer acceptance. A practical step forward from recent work aiming to remedy the sustainability issue of CQA, this study has offered insights into the impact of the asker generally participating in the comments section of an answer on the acceptance of that answer on Stack Overflow (a technical CQA site). A literature review was carefully carried out to show the general scope of CQA research and position this study with related work. Compared with existing work, this study demonstrates its novelty by using attributes derived from answer comments (e.g., AskerInCommentsOrNot) in the models for analyzing answer acceptance. The data collected was broadly about machine learning (ML) along with various topics, making it representative of Stack Overflow. The 19,555 records were analyzed using the Chi-Square test and Logistic Regression. The findings indicate that the asker's participation in the comments section of an answer is associated with the acceptance of that answer, and answers with more of the asker's participation in answer comments are more likely to be accepted. Broadly, this research supports the idea that answer comments are a valuable type of social interaction and feedback in CQA. This research also has beneficial implications for stakeholders on Stack Overflow and potentially technical CQA, including facilitating CQA flow, effectively evaluating helpful information, improving system designs, and motivating user participation.</p> community question answering
33	KGScore-Open: Leveraging Knowledge Graph Semantics For Open-QA Evaluation Hausman, Nicholas 01 June 2024 (has links) (PDF) Evaluating active Question Answering (QA) systems, as users ask questions outside of the original testing data, has proven to be difficult, due to the difficulty of gauging answer quality without ground truth responses. We propose KGScore-Open, a configurable system capable of scoring questions and answers in Open Domain Question Answering (Open-QA) without ground truth answers present by leveraging DBPedia, a Knowledge Graph (KG) derived from Wikipedia, to score question-answer pairs. The system maps entities from questions and answers to DBPedia nodes, constructs a Knowledge Graph based on these entities, and calculates a relatedness score. Our system is validated on multiple datasets, achieving up to 83% accuracy in differentiating relevant from irrelevant answers in the Natural Questions dataset, 55% accuracy in classifying correct versus incorrect answers (hallucinations) in the TruthfulQA and HaluEval datasets, and 54% accuracy on the QA-Eval task using the EVOUNA dataset. The contributions of this work include a novel scoring system for indicating both relevancy and answer confidence in Open-QA without the need for ground truth answers, demonstrated efficacy across various tasks, and an extendable framework applicable to different KGs for evaluating QA systems of other domains. Question Answering Knowledge Graph Question Answering Evaluation LLM DBPedia
34	Encyclopaedic question answering Dornescu, Iustin January 2012 (has links) Open-domain question answering (QA) is an established NLP task which enables users to search for speciVc pieces of information in large collections of texts. Instead of using keyword-based queries and a standard information retrieval engine, QA systems allow the use of natural language questions and return the exact answer (or a list of plausible answers) with supporting snippets of text. In the past decade, open-domain QA research has been dominated by evaluation fora such as TREC and CLEF, where shallow techniques relying on information redundancy have achieved very good performance. However, this performance is generally limited to simple factoid and deVnition questions because the answer is usually explicitly present in the document collection. Current approaches are much less successful in Vnding implicit answers and are diXcult to adapt to more complex question types which are likely to be posed by users. In order to advance the Veld of QA, this thesis proposes a shift in focus from simple factoid questions to encyclopaedic questions: list questions composed of several constraints. These questions have more than one correct answer which usually cannot be extracted from one small snippet of text. To correctly interpret the question, systems need to combine classic knowledge-based approaches with advanced NLP techniques. To Vnd and extract answers, systems need to aggregate atomic facts from heterogeneous sources as opposed to simply relying on keyword-based similarity. Encyclopaedic questions promote QA systems which use basic reasoning, making them more robust and easier to extend with new types of constraints and new types of questions. A novel semantic architecture is proposed which represents a paradigm shift in open-domain QA system design, using semantic concepts and knowledge representation instead of words and information retrieval. The architecture consists of two phases, analysis – responsible for interpreting questions and Vnding answers, and feedback – responsible for interacting with the user. This architecture provides the basis for EQUAL, a semantic QA system developed as part of the thesis, which uses Wikipedia as a source of world knowledge and iii employs simple forms of open-domain inference to answer encyclopaedic questions. EQUAL combines the output of a syntactic parser with semantic information from Wikipedia to analyse questions. To address natural language ambiguity, the system builds several formal interpretations containing the constraints speciVed by the user and addresses each interpretation in parallel. To Vnd answers, the system then tests these constraints individually for each candidate answer, considering information from diUerent documents and/or sources. The correctness of an answer is not proved using a logical formalism, instead a conVdence-based measure is employed. This measure reWects the validation of constraints from raw natural language, automatically extracted entities, relations and available structured and semi-structured knowledge from Wikipedia and the Semantic Web. When searching for and validating answers, EQUAL uses the Wikipedia link graph to Vnd relevant information. This method achieves good precision and allows only pages of a certain type to be considered, but is aUected by the incompleteness of the existing markup targeted towards human readers. In order to address this, a semantic analysis module which disambiguates entities is developed to enrich Wikipedia articles with additional links to other pages. The module increases recall, enabling the system to rely more on the link structure of Wikipedia than on word-based similarity between pages. It also allows authoritative information from diUerent sources to be linked to the encyclopaedia, further enhancing the coverage of the system. The viability of the proposed approach was evaluated in an independent setting by participating in two competitions at CLEF 2008 and 2009. In both competitions, EQUAL outperformed standard textual QA systems as well as semi-automatic approaches. Having established a feasible way forward for the design of open-domain QA systems, future work will attempt to further improve performance to take advantage of recent advances in information extraction and knowledge representation, as well as by experimenting with formal reasoning and inferencing capabilities. 025.04
35	Knowledge Extraction for Hybrid Question Answering Usbeck, Ricardo 22 May 2017 (has links) (PDF) Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources. Named Entity Disambiguierung Benchmarking Hybrid Question Answering Web of Data Web of Documents RDF Semantic Annotation Named Entity Disambiguation Benchmarking Hybrid Question Answering Web of Data Web of Documents RDF ddc:000
36	Zodpovídání dotazů o obrázcích / Visual Question Answering Hajič, Jakub January 2017 (has links) Visual Question Answering (VQA) is a recently proposed multimodal task in the general area of machine learning. The input to this task consists of a single image and an associated natural language question, and the output is the answer to that question. In this thesis we propose two incremental modifications to an existing model which won the VQA Challenge in 2016 using multimodal compact bilinear pooling (MCB), a novel way of combining modalities. First, we added the language attention mechanism, and on top of that we introduce an image attention mechanism focusing on objects detected in the image ("region attention"). We also experiment with ways of combining these in a single end- to-end model. The thesis describes the MCB model and our extensions and their two different implementations, and evaluates them on the original VQA challenge dataset for direct comparison with the original work. 1
37	EVIDENCE BASED MEDICAL QUESTION ANSWERING SYSTEM USING KNOWLEDGE GRAPH PARADIGM Aqeel, Aya 22 June 2022 (has links) No description available. Artificial Intelligence Biomedical Research Medicine NLP Natural Language Processing Question Answering QA Biomedical Question Answering Knowledge Graph KG, Machine Learning ML EBM Evidence-Based Medicine
38	Exploring Knowledge Vaults with ChatGPT : A Domain-Driven Natural Language Approach to Document-Based Answer Retrieval Hammarström, Mathias January 2023 (has links) Problemlösning är en viktig aspekt i många yrken. Inklusive fabriksmiljöer, där problem kan leda till minskad produktion eller till och med produktionsstopp. Denna studie fokuserar på en specifik domän: en massafabrik i samarbete med SCA Massa. Syftet med studien är att undersöka potentialen av ett frågebesvarande system för att förbättra arbetarnas förmåga att lösa problem genom att förse dem med möjliga lösningar baserat på en naturlig beskrivning av problemet. Detta uppnås genom att ge arbetarna ett naturligt språk gränssnitt till en stor mängd domänspecifika dokument. Mer specifikt så fungerar systemet genom att utöka ChatGPT med domänspecifika dokument som kontext för en fråga. De relevanta dokumenten hittas med hjälp av en retriever, som använder vektorrepresentationer för varje dokument och jämför sedan dokumentens vektorer med frågans vektor. Resultaten visar att system har genererat rätt svar 92% av tiden, felaktigt svar 5% av tiden och inget svar ges 3% av tiden. Slutsatsen av denna studie är att det implementerade frågebesvarande systemet är lovande, speciellt när det används av en expert eller skicklig arbetare som är mindre benägen att vilseledas av felaktiga svar. Dock, på grund av studiens begränsade omfattning så krävs ytterligare studier för att avgöra om systemet är redo att distribueras i verkliga miljöer. / Problem solving is a key aspect in many professions. Including a factory setting, where problems can cause the production to slow down or even halt completely. The specific domain for this project is a pulp factory setting in collaboration with SCA Pulp. This study explores the potential of a question-answering system to enhance workers ability to solve a problem by providing possible solutions from a natural language description of the problem. This is accomplished by giving workers a natural language interface to a large corpus of domain-specific documents. More specifically the system works by augmenting ChatGPT with domain specific documents as context for a question. The relevant documents are found using a retriever, which uses vector representations for each document, and then compares the documents vectors with the question vector. The result shows that the system has generated a correct answer 92% of the time, an incorrect answer 5% of the time and no answer was given 3% of the time. Conclusions drawn from this study is that the implemented question-answering system is promising, especially when used by an expert or skilled worker who is less likely to be misled by the incorrect answers. However, due to the study’s small scale further study is required to conclude that this system is ready to be deployed in real-world scenarios. Human-computer-interaction NLP LLM ChatGPT Question-Answering Information-Retrieval. Människa-dator interaktion NLP LLM ChatGPT Question-Answering Information-Retrieval. Software Engineering Programvaruteknik
39	Решение задачи QA для низкоресурсных языков тюркской языковой группы : магистерская диссертация / Solving the QA task for low-resource languages of the Turkic language group Медовиков, А. А., Medovikov, A. A. January 2024 (has links) The purpose of the work is to conduct a comprehensive study of the solution of the QA task for low-resource languages using the example of the Kazakh and Uzbek languages, creating models and datasets in the corresponding languages, using machine translation of datasets from high-resource languages using special markers. The hypothesis of the importance of proximity of languages in choosing the source language for translation is also being investigated. QA models have been created that demonstrate better results for the Kazakh and Uzbek languages than all other publicly available models. / Цель работы состоит в том, чтобы провести комплексное исследование решения задачи QA для низкоресурсных языков в виде казахского и узбекского языка, создав модели и датасеты на соответствующих языках, при помощи машинного перевода датасетов на высокоресурсных языках с использованием специальных маркеров. Также исследуется гипотеза о важности близости языков при выборе языка источника для перевода. Созданы QA-модели, демонстрирующие лучшие результаты для казахского и узбекского языков, чем все другие публично доступные модели. MASTER'S THESIS ML NLP QA QUESTION ANSWERING EXTRACTIVE QUESTION ANSWERING LOW-RESOURCE LANGUAGE KAZAKH LANGUAGE UZBEK LANGUAGE TRANSFORMERS НИЗКОРЕСУРСНЫЕ ЯЗЫКИ КАЗАХСКИЙ ЯЗЫК УЗБЕКСКИЙ ЯЗЫК ТРАНСФОРМЕРЫ
40	Un système de question-réponse dans le domaine médical : le système Esculape / A question answering system in the medical domain : the Esculape system Embarek, Mehdi 04 July 2008 (has links) Le domaine médical dispose aujourd'hui d'un très grand volume de documents électroniques permettant ainsi la recherche d’une information médicale quelconque. Cependant, l'exploitation de cette grande quantité de données rend la recherche d’une information précise complexe et coûteuse en termes de temps. Cette difficulté a motivé le développement de nouveaux outils de recherche adaptés, comme les systèmes de question-réponse. En effet, ce type de système permet à un utilisateur de poser une question en langage naturel et de retourner une réponse précise à sa requête au lieu d'un ensemble de documents jugés pertinents, comme c'est le cas des moteurs de recherche. Les questions soumises à un système de question-réponse portent généralement sur un type d’objet ou sur une relation entre objets. Dans le cas d’une question telle que « Qui a découvert l’Amérique ? » par exemple, l’objet de la question est une personne. Dans des domaines plus spécifiques, tel que le domaine médical, les types rencontrés sont eux-mêmes plus spécifiques. La question « Comment rechercher l'hématurie ? » appelle ainsi une réponse de type examen médical. L'objectif de ce travail est de mettre en place un système de question-réponse pour des médecins généralistes portant sur les bonnes pratiques médicales. Ce système permettra au médecin de consulter une base de connaissances lorsqu'il se trouve en consultation avec un patient. Ainsi, dans ce travail, nous présentons une stratégie de recherche adaptée au domaine médical. Plus précisément, nous exposerons une méthode pour l’analyse des questions médicales et l’approche adoptée pour trouver une réponse à une question posée. Cette approche consiste à rechercher en premier lieu une réponse dans une ontologie médicale construite à partir de essources sémantiques disponibles pour la spécialité. Si la réponse n’est pas trouvée, le système applique des patrons linguistiques appris automatiquement pour repérer la réponse recherchée dans une collection de documents candidats. L’intérêt de notre approche a été illustré au travers du système de question-réponse « Esculape » qui a fait l’objet d’une évaluation montrant que la prise en compte explicite de connaissances médicales permet d’améliorer les résultats des différents modules du processus de traitement / The medical domain has currently a very high volume of electronic documents facilitating the search of any medical information. However, the exploitation of this large quantity of data makes the search of specific information complex and time consuming. This difficulty has prompted the development of new adapted research tools, as question-answering systems. Indeed, this type of system allows a user to ask a question in natural language and send a specific answer to its request instead of a set of documents deemed pertinent, as is the case with search engines. The questions submitted to a question-answering system concern generally a type of object or a relationship between objects. In the case of a question such as “Who discovered America?” the object of question is a person. In more specific areas, such as the medical domain, the types are themselves more specific. The question “How to Search the hematuria?” waiting for an answer type medical examination. This dissertation studies the development of a question-answering system for physicians on good medical practices. This system will allow the doctor to consult a knowledge base when he is in consultation with a patient. Thus, we present an adapted research strategy to medical domain. Specifically, we will present a method for analyzing medical questions and the approach to find an answer to a submitted question. This approach consists to find an answer first in a medical ontology built from semantic resources available for the domain. If the answer is not found, the system applies linguistic patterns learned automatically to identify the answer in a collection of documents. The interest of our approach has been illustrated through the question answering system “Esculape” which has been the subject of an evaluation showing that the incorporation of explicit medical knowledge can improves the results of the different modules of the treatment processes Systèmes de question-réponse Patrons linguistiques Question-answering systems Medical domain Ontology Linguistic patterns

Search results