Spelling suggestions: "subject:"cisual reasoning"" "subject:"4visual reasoning""
1 |
Multimodal Representation Learning for Visual Reasoning and Text-to-Image TranslationJanuary 2018 (has links)
abstract: Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated.
Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated.
Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset. / Dissertation/Thesis / Masters Thesis Computer Engineering 2018
|
2 |
Constructive Adaptive Visual AnalogyDavies, Jim 11 August 2004 (has links)
Visual knowledge appears to be an important part of problem
solving, but the role of visual knowledge in analogical problem
solving is still somewhat mysterious.
In this work I present the Constructive Adaptive
Visual Analogy theory, which claims that visual knowledge is helpful
for solving problems analogically
and suggests a mechanism for how it might be accomplished.
Through evaluations using an implemented computer program, cognitive
models of some of the visual aspects of experimental participants, and
a psychological experiment, I support four claims:
First, visual knowledge alone is sufficient for transfer of some
problem solving procedures.
Second, visual knowledge facilitates
transfer even when non-visual knowledge might be available.
Third, the
successful transfer of strongly-ordered procedures in which new
objects are created requires the reasoner to generate intermediate
knowledge states and mappings between the intermediate knowledge
states of the source and target analogs.
And finally, that visual knowledge
alone is insufficient for evaluation of the results of transfer.
|
3 |
Visual problem solving in autism, psychometrics, and AI: the case of the Raven's Progressive Matrices intelligence testKunda, Maithilee 03 April 2013 (has links)
Much of cognitive science research and almost all of AI research into problem solving has focused on the use of verbal or propositional representations. However, there is significant evidence that humans solve problems using different representational modalities, including visual or iconic ones. In this dissertation, I investigate visual problem solving from the perspectives of autism, psychometrics, and AI.
Studies of individuals on the autism spectrum show that they often use atypical patterns of cognition, and anecdotal reports have frequently mentioned a tendency to "think visually." I examined one precise characterization of visual thinking in terms of iconic representations. I then conducted a comprehensive review of data on several cognitive tasks from the autism literature and found numerous instances indicating that some individuals with autism may have a disposition towards visual thinking.
One task, the Raven's Progressive Matrices test, is of particular interest to the field of psychometrics, as it represents one of the single best measures of general intelligence that has yet been developed. Typically developing individuals are thought to solve the Raven's test using largely verbal strategies, especially on the more difficult subsets of test problems. In line with this view, computational models of information processing on the Raven's test have focused exclusively on propositional representations. However, behavioral and fMRI studies of individuals with autism suggest that these individuals may use instead a predominantly visual strategy across most or all test problems.
To examine visual problem solving on the Raven's test, I first constructed a computational model, called the Affine and Set Transformation Induction (ASTI) model, which uses a combination of affine transformations and set operations to solve Raven's problems using purely pixel-based representations of problem inputs, without any propositional encoding. I then performed four analyses using this model.
First, I tested the model against three versions of the Raven's test, to determine the sufficiency of visual representations for solving this type of problem. The ASTI model successfully solves 50 of the 60 problems on the Standard Progressive Matrices (SPM) test, comparable in performance to the best computational models that use propositional representations. Second, I evaluated model robustness in the face of changes to the representation of pixels and visual similarity. I found that varying these low-level representational commitments causes only small changes in overall performance. Third, I performed successive ablations of the model to create a new classification of problem types, based on which transformations are necessary and sufficient for finding the correct answer. Fourth, I examined if patterns of errors made on the SPM can provide a window into whether a visual or verbal strategy is being used. While many of the observed error patterns were predicted by considering aspects of the model and of human behavior, I found that overall error patterns do not seem to provide a clear indicator of strategy type.
The main contributions of this dissertation include: (1) a rigorous definition and examination of a disposition towards visual thinking in autism; (2) a sufficiency proof, through the construction of a novel computational model, that visual representations can successfully solve many Raven's problems; (3) a new, data-based classification of problem types on the SPM; (4) a new classification of conceptual error types on the SPM; and (5) a methodology for analyzing, and an analysis of, error patterns made by humans and computational models on the SPM. More broadly, this dissertation contributes significantly to our understanding of visual problem solving.
|
4 |
From Shape to Function: Acquisition of Teleological Models from Design Drawings by Compositional AnalogyYaner, Patrick William 18 October 2007 (has links)
Visual media are of great importance to designers. Understanding a new design, for example, often means understanding a drawing. From the perspective of artificial intelligence, this implies that automated knowledge acquisition in computer-aided design can productively occur using drawings as a knowledge source. However, this requires machines that are able to interpret design drawings.
I view the task of interpreting drawings as one of constructing a teleological model of the design depicted in the drawings, where the model enables causal and functional inferences about the depicted design. I have developed a novel analogical method for constructing a teleological model of a mechanical device from an unlabelled 2D line drawing. The source case is organized in a Drawing Shape Structure Behavior Function (DSSBF) abstraction hierarchy. This knowledge
organization enables the analogical mapping and transfer to occur at multiple levels of abstraction.
Given a target drawing and a relevant source case, my method of compositional analogy first constructs a graphical representation of the lines and the intersections in the target drawing, then uses the mappings at the level of line intersections to transfer the shape representations from the source case to the target. It next uses the mappings at the level of shapes to transfer the structural model of the device from the source to the target. Finally, the mappings from the source to the target structural model enable the transfer of behaviors and the functional specification from source to target, completing the analogy and yielding a complete DSSBF model of the input drawing. The Archytas system implements this method of compositional analogy and evaluates it in the domain of kinematic devices such as piston and crankshaft devices, door latches, and pulley systems.
|
5 |
Visual question answering with modules and language modelingPahuja, Vardaan 04 1900 (has links)
No description available.
|
Page generated in 0.0608 seconds