• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 145
  • 11
  • 10
  • 9
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 212
  • 156
  • 79
  • 64
  • 56
  • 55
  • 49
  • 40
  • 40
  • 39
  • 39
  • 32
  • 28
  • 26
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Application of Definability to Query Answering over Knowledge Bases

Kinash, Taras January 2013 (has links)
Answering object queries (i.e. instance retrieval) is a central task in ontology based data access (OBDA). Performing this task involves reasoning with respect to a knowledge base K (i.e. ontology) over some description logic (DL) dialect L. As the expressive power of L grows, so does the complexity of reasoning with respect to K. Therefore, eliminating the need to reason with respect to a knowledge base K is desirable. In this work, we propose an optimization to improve performance of answering object queries by eliminating the need to reason with respect to the knowledge base and, instead, utilizing cached query results when possible. In particular given a DL dialect L, an object query C over some knowledge base K and a set of cached query results S={S1, ..., Sn} obtained from evaluating past queries, we rewrite C into an equivalent query D, that can be evaluated with respect to an empty knowledge base, using cached query results S' = {Si1, ..., Sim}, where S' is a subset of S. The new query D is an interpolant for the original query C with respect to K and S. To find D, we leverage a tool for enumerating interpolants of a given sentence with respect to some theory. We describe a procedure that maps a knowledge base K, expressed in terms of a description logic dialect of first order logic, and object query C into an equivalent theory and query that are input into the interpolant enumerating tool, and resulting interpolants into an object query D that can be evaluated over an empty knowledge base. We show the efficacy of our approach through experimental evaluation on a Lehigh University Benchmark (LUBM) data set, as well as on a synthetic data set, LUBMMOD, that we created by augmenting an LUBM ontology with additional axioms.
32

Solving University entrance assessment using information retrieval / Resolvendo Vestibular utilizando recuperação de informação

Igor Cataneo Silveira 05 July 2018 (has links)
Answering questions posed in natural language is a key task in Artificial Intelligence. However, producing a successful Question Answering (QA) system is challenging, since it requires text understanding, information retrieval, information extraction and text production. This task is made even harder by the difficulties in collecting reliable datasets and in evaluating techniques, two pivotal points for machine learning approaches. This has led many researchers to focus on Multiple-Choice Question Answering (MCQA), a special case of QA where systems must select the correct answers from a small set of alternatives. One particularly interesting type of MCQA is solving Standardized Tests, such as Foreign Language Proficiency exams, Elementary School Science exams and University Entrance exams. These exams provide easy-to-evaluate challenging multiple-choice questions of varying difficulties about large, but limited, domains. The Exame Nacional do Ensino Médio (ENEM) is a High School level exam taken every year by students all over Brazil. It is widely used by Brazilian universities as an entrance exam and is the world\'s second biggest university entrance examination in number of registered candidates. This exam consists in writing an essay and solving a multiple-choice test comprising questions on four major topics: Humanities, Language, Science and Mathematics. Questions inside each major topic are not segmented by standard scholar disciplines (e.g. Geography, Biology, etc.) and often require interdisciplinary reasoning. Moreover, the previous editions of the exam and their solutions are freely available online, making it a suitable benchmark for MCQA. In this work we automate solving the ENEM focusing, for simplicity, on purely textual questions that do not require mathematical thinking. We formulate the problem of answering multiple-choice questions as finding the candidate-answer most similar to the statement. We investigate two approaches for measuring textual similarity of candidate-answer and statement. The first approach addresses this as a Text Information Retrieval (IR) problem, that is, as a problem of finding in a database the most relevant document to a query. Our queries are made of statement plus candidate-answer and we use three different corpora as database: the first comprises plain-text articles extracted from a dump of the Wikipedia in Portuguese language; the second contains only the text given in the question\'s header and the third is composed by pairs of question and correct answer extracted from ENEM assessments. The second approach is based on Word Embedding (WE), a method to learn vectorial representation of words in a way such that semantically similar words have close vectors. WE is used in two manners: to augment IR\'s queries by adding related words to those on the query according to the WE model, and to create vectorial representations for statement and candidate-answers. Using these vectorial representations we answer questions either directly, by selecting the candidate-answer that maximizes the cosine similarity to the statement, or indirectly, by extracting features from the representations and then feeding them into a classifier that decides which alternative is the answer. Along with the two mentioned approaches we investigate how to enhance them using WordNet, a structured lexical database where words are connected according to some relations like synonymy and hypernymy. Finally, we combine different configurations of the two approaches and their WordNet variations by creating an ensemble of algorithms found by a greedy search. This ensemble chooses an answer by the majority voting of its components. The first approach achieved an average of 24% accuracy using the headers, 25% using the pairs database and 26.9% using Wikipedia. The second approach achieved 26.6% using WE indirectly and 28% directly. The ensemble achieved 29.3% accuracy. These results, slightly above random guessing (20%), suggest that these techniques can capture some of the necessary skills to solve standardized tests. However, more sophisticated techniques that perform text understanding and common sense reasoning might be required to achieve human-level performance. / Responder perguntas feitas em linguagem natural é uma capacidade há muito desejada pela Inteligência Artificial. Porém, produzir um sistema de Question Answering (QA) é uma tarefa desafiadora, uma vez que ela requer entendimento de texto, recuperação de informação, extração de informação e produção de texto. Além disso, a tarefa se torna ainda mais difícil dada a dificuldade em coletar datasets confiáveis e em avaliar as técnicas utilizadas, sendo estes pontos de suma importância para abordagens baseadas em aprendizado de máquina. Isto tem levado muitos pesquisadores a focar em Multiple-Choice Question Answering (MCQA), um caso especial de QA no qual os sistemas devem escolher a resposta correta dentro de um grupo de possíveis respostas. Um caso particularmente interessante de MCQA é o de resolver testes padronizados, tal como testes de proficiência linguística, teste de ciências para ensino fundamental e vestibulares. Estes exames fornecem perguntas de múltipla escolha de fácil avaliação sobre diferentes domínios e de diferentes dificuldades. O Exame Nacional do Ensino Médio (ENEM) é um exame realizado anualmente por estudantes de todo Brasil. Ele é utilizado amplamente por universidades brasileiras como vestibular e é o segundo maior vestibular do mundo em número de candidatos inscritos. Este exame consiste em escrever uma redação e resolver uma parte de múltipla escolha sobre questões de: Ciências Humanas, Linguagens, Matemática e Ciências Naturais. As questões nestes tópicos não são divididas por matérias escolares (Geografia, Biologia, etc.) e normalmente requerem raciocínio interdisciplinar. Ademais, edições passadas do exame e suas soluções estão disponíveis online, tornando-o um benchmark adequado para MCQA. Neste trabalho nós automatizamos a resolução do ENEM focando, por simplicidade, em questões puramente textuais que não requerem raciocínio matemático. Nós formulamos o problema de responder perguntas de múltipla escolha como um problema de identificar a alternativa mais similar à pergunta. Nós investigamos duas abordagens para medir a similaridade textual entre pergunta e alternativa. A primeira abordagem trata a tarefa como um problema de Recuperação de Informação Textual (IR), isto é, como um problema de identificar em uma base de dados qualquer qual é o documento mais relevante dado uma consulta. Nossas consultas são feitas utilizando a pergunta mais alternativa e utilizamos três diferentes conjuntos de texto como base de dados: o primeiro é um conjunto de artigos em texto simples extraídos da Wikipedia em português; o segundo contém apenas o texto dado no cabeçalho da pergunta e o terceiro é composto por pares de questão-alternativa correta extraídos de provas do ENEM. A segunda abordagem é baseada em Word Embedding (WE), um método para aprender representações vetoriais de palavras de tal modo que palavras semanticamente próximas possuam vetores próximos. WE é usado de dois modos: para aumentar o texto das consultas de IR e para criar representações vetoriais para a pergunta e alternativas. Usando essas representações vetoriais nós respondemos questões diretamente, selecionando a alternativa que maximiza a semelhança de cosseno em relação à pergunta, ou indiretamente, extraindo features das representações e dando como entrada para um classificador que decidirá qual alternativa é a correta. Junto com as duas abordagens nós investigamos como melhorá-las utilizando a WordNet, uma base estruturada de dados lexicais onde palavras são conectadas de acordo com algumas relações, tais como sinonímia e hiperonímia. Por fim, combinamos diferentes configurações das duas abordagens e suas variações usando WordNet através da criação de um comitê de resolvedores encontrado através de uma busca gulosa. O comitê escolhe uma alternativa através de voto majoritário de seus constituintes. A primeira abordagem teve 24% de acurácia utilizando o cabeçalho, 25% usando a base de dados de pares e 26.9% usando Wikipedia. A segunda abordagem conseguiu 26.6% de acurácia usando WE indiretamente e 28% diretamente. O comitê conseguiu 29.3%. Estes resultados, pouco acima do aleatório (20%), sugerem que essas técnicas conseguem captar algumas das habilidades necessárias para resolver testes padronizados. Entretanto, técnicas mais sofisticadas, capazes de entender texto e de executar raciocínio de senso comum talvez sejam necessárias para alcançar uma performance humana.
33

Improving Conversation Quality of Data-driven Dialog Systems and Applications in Conversational Question Answering

Baheti, Ashutosh January 2020 (has links)
No description available.
34

Automated question answering for clinical comparison questions

Leonhard, Annette Christa January 2012 (has links)
This thesis describes the development and evaluation of new automated Question Answering (QA) methods tailored to clinical comparison questions that give clinicians a rank-ordered list of MEDLINE® abstracts targeted to natural language clinical drug comparison questions (e.g. ”Have any studies directly compared the effects of Pioglitazone and Rosiglitazone on the liver?”). Three corpora were created to develop and evaluate a new QA system for clinical comparison questions called RetroRank. RetroRank takes the clinician’s plain text question as input, processes it and outputs a rank-ordered list of potential answer candidates, i.e. MEDLINE® abstracts, that is reordered using new post-retrieval ranking strategies to ensure the most topically-relevant abstracts are displayed as high in the result set as possible. RetroRank achieves a significant improvement over the PubMed recency baseline and performs equal to or better than previous approaches to post-retrieval ranking relying on query frames and annotated data such as the approach by Demner-Fushman and Lin (2007). The performance of RetroRank shows that it is possible to successfully use natural language input and a fully automated approach to obtain answers to clinical drug comparison questions. This thesis also introduces two new evaluation corpora of clinical comparison questions with “gold standard” references that are freely available and are a valuable resource for future research in medical QA.
35

Productivity Considerations for Online Help Systems

Shultz, Charles R. (Charles Richard) 05 1900 (has links)
The purpose of this study was to determine if task type, task complexity, and search mechanism would have a significant affect on task performance. The problem motivating this study is the potential for systems online help designers to construct systems that can improve the performance of computer users when they need help.
36

A Web-based Question Answering System

Zhang, Dell, Lee, Wee Sun 01 1900 (has links)
The Web is apparently an ideal source of answers to a large variety of questions, due to the tremendous amount of information available online. This paper describes a Web-based question answering system LAMP, which is publicly accessible. A particular characteristic of this system is that it only takes advantage of the snippets in the search results returned by a search engine like Google. We think such “snippet-tolerant” property is important for an online question answering system to be practical, because it is time-consuming to download and analyze the original web documents. The performance of LAMP is comparable to the best state-of-the-art question answering systems. / Singapore-MIT Alliance (SMA)
37

Exploiting Lexical Regularities in Designing Natural Language Systems

Katz, Boris, Levin, Beth 01 April 1988 (has links)
This paper presents the lexical component of the START Question Answering system developed at the MIT Artificial Intelligence Laboratory. START is able to interpret correctly a wide range of semantic relationships associated with alternate expressions of the arguments of verbs. The design of the system takes advantage of the results of recent linguistic research into the structure of the lexicon, allowing START to attain a broader range of coverage than many existing systems.
38

Automatic question generation : a syntactical approach to the sentence-to-question generation case

Ali, Husam Deeb Abdullah Deeb January 2012 (has links)
Humans are not often very skilled in asking good questions because of their inconsistent mind in certain situations. Thus, Question Generation (QG) and Question Answering (QA) became the two major challenges for the Natural Language Processing (NLP), Natural Language Generation (NLG), Intelligent Tutoring System, and Information Retrieval (IR) communities, recently. In this thesis, we consider a form of Sentence-to-Question generation task where given a sentence as input, the QG system would generate a set of questions for which the sentence contains, implies, or needs answers. Since the given sentence may be a complex sentence, our system generates elementary sentences from the input complex sentences using a syntactic parser. A Part of Speech (POS) tagger and a Named Entity Recognizer (NER) are used to encode necessary information. Based on the subject, verb, object and preposition information, sentences are classified in order to determine the type of questions to be generated. We conduct extensive experiments on the TREC-2007 (Question Answering Track) dataset. The scenario for the main task in the TREC-2007 QA track was that an adult, native speaker of English is looking for information about a target of interest. Using the given target, we filter out the important sentences from the large sentence pool and generate possible questions from them. Once we generate all the questions from the sentences, we perform a recall-based evaluation. That is, we count the overlap of our system generated questions with the given questions in the TREC dataset. For a topic, we get a recall 1.0 if all the given TREC questions are generated by our QG system and 0.0 if opposite. To validate the performance of our QG system, we took part in the First Question Generation Shared Task Evaluation Challenge, QGSTEC in 2010. Experimental analysis and evaluation results along with a comparison of different participants of QGSTEC'2010 show potential significance of our QG system. / x, 125 leaves : ill. ; 29 cm
39

Class-free answer typing

Pinchak, Christopher Unknown Date
No description available.
40

Statistical Source Expansion for Question Answering

Schlaefer, Nico 01 January 2011 (has links)
A source expansion algorithm automatically extends a given text corpus with related information from large, unstructured sources. While the expanded corpus is not intended for human consumption, it can be leveraged in question answering (QA) and other information retrieval or extraction tasks to find more relevant knowledge and to gather additional evidence for evaluating hypotheses. In this thesis, we propose a novel algorithm that expands a collection of seed documents by (1) retrieving related content from the Web or other large external sources, (2) extracting self-contained text nuggets from the related content, (3) estimating the relevance of the text nuggets with regard to the topics of the seed documents using a statistical model, and (4) compiling new pseudo-documents from nuggets that are relevant and complement existing information. In an intrinsic evaluation on a dataset comprising 1,500 hand-labeled web pages, the most elective statistical relevance model ranked text nuggets by relevance with 81% MAP, compared to 43% when relying on rankings generated by a web search engine, and 75% when using a multi-document summarization algorithm. These differences are statistically significant and result in noticeable gains in search performance in a task-based evaluation on QA datasets. The statistical models use a comprehensive set of features to predict the topicality and quality of text nuggets based on topic models built from seed content, search engine rankings and surface characteristics of the retrieved text. Linear models that evaluate text nuggets individually are compared to a sequential model that estimates their relevance given the surrounding nuggets. The sequential model leverages features derived from text segmentation algorithms to dynamically predict transitions between relevant and irrelevant passages. It slightly outperforms the best linear model while using fewer parameters and requiring less training time. In addition, we demonstrate that active learning reduces the amount of labeled data required to fit a relevance model by two orders of magnitude with little loss in ranking performance. This facilitates the adaptation of the source expansion algorithm to new knowledge domains and applications. Applied to the QA task, the proposed method yields consistent and statistically significant performance gains across different datasets, seed corpora and retrieval strategies. We evaluated the impact of source expansion on search performance and end-to-end accuracy using Watson and the OpenEphyra QA system, and datasets comprising over 6,500 questions from the Jeopardy! quiz show and TREC evaluations. By expanding various seed corpora with web search results, we were able to improve the QA accuracy of Watson from 66% to 71% on regular Jeopardy! questions, from 45% to 51% on Final Jeopardy! questions and from 59% to 64% on TREC factoid questions. We also show that the source expansion approach can be adapted to extract relevant content from locally stored sources without requiring a search engine, and that this method yields similar performance gains. When combined with the approach that uses web search results, Watson's accuracy further increases to 72% on regular Jeopardy! data, 54% on Final Jeopardy! and 67% on TREC questions.

Page generated in 0.2423 seconds