• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Towards Interpretable Vision Systems

Zhang, Peng 06 December 2017 (has links)
Artificial intelligent (AI) systems today are booming and they are used to solve new tasks or improve the performance on existing ones. However, most AI systems work in a black-box fashion, which prevents the users from accessing the inner modules. This leads to two major problems: (i) users have no idea when the underlying system will fail and thus it could fail abruptly without any warning or explanation, and (ii) users' lack of proficiency about the system could fail pushing the AI progress to its state-of-the-art. In this work, we address these problems in the following directions. First, we develop a failure prediction system, acting as an input filter. It raises a flag when the system is likely to fail with the given input. Second, we develop a portfolio computer vision system. It is able to predict which of the candidate computer vision systems perform the best on the input. Both systems have the benefit of only looking at the inputs without running the underlying vision systems. Besides, they are applicable to any vision system. By equipped such systems on different applications, we confirm the improved performance. Finally, instead of identifying errors, we develop more interpretable AI systems, which reveal the inner modules directly. We take two tasks as examples, words semantic matching and Visual Question Answering (VQA). In VQA, we take binary questions on abstract scenes as the first stage, then we extend to all question types on real images. In both cases, we take attention as an important intermediate output. By explicitly forcing the systems to attend correct regions, we ensure the correctness in the systems. We build a neural network to directly learn the semantic matching, instead of using the relation similarity between words. Across all the above directions, we show that by diagnosing errors and making more interpretable systems, we are able to improve the performance in the current models. / Ph. D.

Page generated in 0.1033 seconds