Global ETD Search

Return to search

Visual Question Answering in the Medical Domain

Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this thesis, we develop a deep learning-based model for VQA on medical images taking the associated challenges into account. Our MedFuseNet system aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and weaving everything together to predict the answer. We tackle two types of answer prediction - categorization and generation. We conduct an extensive set of both quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks. / Master of Science / Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. In this thesis, we propose an end-to-end deep learning-based system, MedFuseNet, for predicting the answer for the input query associated with the image. We cater to close-ended as well as open-ended type question-answer pairs. We conduct an extensive analysis to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks.

Visual Question Answering

deep learning

medical images

Identifer	oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/107586
Date	21 July 2020
Creators	Sharma, Dhruv
Contributors	Computer Science, Reddy, Chandan K., Viswanath, Bimal, Jiang, Jiepu
Publisher	Virginia Tech
Source Sets	Virginia Tech Theses and Dissertation
Detected Language	English
Type	Thesis
Format	ETD, application/pdf
Rights	In Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0023 seconds

Visual Question Answering in the Medical Domain

Description

Links & Downloads

Tags

Additional Fields