• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Data Augmentation with Seq2Seq Models

Granstedt, Jason Louis 06 July 2017 (has links)
Paraphrase sparsity is an issue that complicates the training process of question answering systems: syntactically diverse but semantically equivalent sentences can have significant disparities in predicted output probabilities. We propose a method for generating an augmented paraphrase corpus for the visual question answering system to make it more robust to paraphrases. This corpus is generated by concatenating two sequence to sequence models. In order to generate diverse paraphrases, we sample the neural network using diverse beam search. We evaluate the results on the standard VQA validation set. Our approach results in a significantly expanded training dataset and vocabulary size, but has slightly worse performance when tested on the validation split. Although not as fruitful as we had hoped, our work highlights additional avenues for investigation into selecting more optimal model parameters and the development of a more sophisticated paraphrase filtering algorithm. The primary contribution of this work is the demonstration that decent paraphrases can be generated from sequence to sequence models and the development of a pipeline for developing an augmented dataset. / Master of Science
2

Robustness Analysis of Visual Question Answering Models by Basic Questions

Huang, Jia-Hong 11 1900 (has links)
Visual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.
3

Improving Visual Question Answering by Leveraging Depth and Adapting Explainability / Förbättring av Visual Question Answering (VQA) genom utnyttjandet av djup och anpassandet av förklaringsförmågan

Panesar, Amrita Kaur January 2022 (has links)
To produce smooth human-robot interactions, it is important for robots to be able to answer users’ questions accurately and provide a suitable explanation for why they arrive to the answer they provide. However, in the wild, the user may ask the robot questions relating to aspects of the scene that the robot is unfamiliar with and hence be unable to answer correctly all of the time. In order to gain trust in the robot and resolve failure cases where an incorrect answer is provided, we propose a method that uses Grad-CAM explainability on RGB-D data. Depth is a critical component in producing more intelligent robots that can respond correctly most of the time as some questions might rely on spatial relations within the scene, for which 2D RGB data alone would be insufficient. To our knowledge, this work is the first of its kind to leverage depth and an explainability module to produce an explainable Visual Question Answering (VQA) system. Furthermore, we introduce a new dataset for the task of VQA on RGB-D data, VQA-SUNRGBD. We evaluate our explainability method against Grad-CAM on RGB data and find that ours produces better visual explanations. When we compare our proposed model on RGB-D data against the baseline VQN network on RGB data alone, we show that ours outperforms, particularly in questions relating to depth such as asking about the proximity of objects and relative positions of objects to one another. / För att skapa smidiga interaktioner mellan människa och robot är det viktigt för robotar att kunna svara på användarnas frågor korrekt och ge en lämplig förklaring till varför de kommer fram till det svar de ger. Men i det vilda kan användaren ställa frågor till roboten som rör aspekter av miljön som roboten är obekant med och därmed inte kunna svara korrekt hela tiden. För att få förtroende för roboten och lösa de misslyckade fall där ett felaktigt svar ges, föreslår vi en metod som använder Grad-CAM-förklarbarhet på RGB-D-data. Djup är en kritisk komponent för att producera mer intelligenta robotar som kan svara korrekt för det mesta, eftersom vissa frågor kan förlita sig på rumsliga relationer inom scenen, för vilka enbart 2D RGB-data skulle vara otillräcklig. Såvitt vi vet är detta arbete det första i sitt slag som utnyttjar djup och en förklaringsmodul för att producera ett förklarabart Visual Question Answering (VQA)-system. Dessutom introducerar vi ett nytt dataset för uppdraget av VQA på RGB-D-data, VQA-SUNRGBD. Vi utvärderar vår förklaringsmetod mot Grad-CAM på RGB-data och finner att vår modell ger bättre visuella förklaringar. När vi jämför vår föreslagna modell för RGB-Ddata mot baslinje-VQN-nätverket på enbart RGB-data visar vi att vår modell överträffar, särskilt i frågor som rör djup, som att fråga om objekts närhet och relativa positioner för objekt jämntemot varandra.

Page generated in 0.012 seconds