Global ETD Search

1	A revised framework for human scene recognition Linsley, Drew January 2016 (has links) Thesis advisor: Sean P. MacEvoy / For humans, healthy and productive living depends on navigating through the world and behaving appropriately along the way. But in order to do this, humans must first recognize their visual surroundings. The technical difficulty of this task is hard to comprehend: the number of possible scenes that can fall on the retina approaches infinity, and yet humans often effortlessly and rapidly recognize their surroundings. Understanding how humans accomplish this task has long been a goal of psychology and neuroscience, and more recently, has proven useful in inspiring and constraining the development of new algorithms for artificial intelligence (AI). In this thesis I begin by reviewing the current state of scene recognition research, drawing upon evidence from each of these areas, and discussing an unchallenged assumption in the literature: that scene recognition emerges from independently processing information about scenes’ local visual features (i.e. the kinds of objects they contain) and global visual features (i.e., spatial parameters. ). Over the course of several projects, I challenge this assumption with a new framework for scene recognition that indicates a crucial role for information sharing between these resources. Development and validation of this framework will expand our understanding of scene recognition in humans and provide new avenues for research by expanding these concepts to other domains spanning psychology, neuroscience, and AI. / Thesis (PhD) — Boston College, 2016. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Psychology. Biological vision Computer vision Scene recognition
2	The analytic edge - image reconstruction from edge data via the Cauchy Integral Hay, Todd 08 April 2016 (has links) A novel image reconstruction algorithm from edges (image gradients) follows from the Sokhostki-Plemelj Theorem of complex analysis, an elaboration of the standard Cauchy (Singular) Integral. This algorithm demonstrates the use of Singular Integral Equation methods to image processing, extending the more common use of Partial Differential Equations (e.g. based on variants of the Diffusion or Poisson equations). The Cauchy Integral approach has a deep connection to and sheds light on the (linear and non-linear) diffusion equation, the retinex algorithm and energy-based image regularization. It extends the commonly understood local definition of an edge to a global, complex analytic structure - the analytic edge - the contrast weighted kernel of the Cauchy Integral. Superposition of the set of analytic edges provides a "filled-in" image which is the piece-wise analytic image corresponding to the edge (gradient data) supplied. This is a fully parallel operation which avoids the time penalty associated with iterative solutions and thus is compatible with the short time (about 150 milliseconds) that is biologically available for the brain to construct a perceptual image from edge data. Although this algorithm produces an exact reconstruction of a filled-in image from the gradients of that image, slight modifications of it produce images which correspond to perceptual reports of human observers when presented with a wide range of "visual contrast illusion" images. Neurosciences Modeling Biological vision Computer vision Filling-in Cauchy Integral
3	Understanding images in biological and computer vision Schofield, A.J., Gilchrist, I.D., Bloj, Marina, Leonardis, A., Bellotto, N. 12 May 2018 (has links) Yes / This issue of Interface Focus is a collection of papers arising out of a Royal Society Discussion meeting entitled ‘Understanding images in biological and computer vision’ held at Carlton Terrace on the 19th and 20th February, 2018. There is a strong tradition of inter-disciplinarity in the study of visual perception and visual cognition. Many of the great natural scientists including Newton [1], Young [2] and Maxwell (see [3]) were intrigued by the relationship between light, surfaces and perceived colour considering both physical and perceptual processes. Brewster [4] invented both the lenticular stereoscope and the binocular camera but also studied the perception of shape-from-shading. More recently, Marr's [5] description of visual perception as an information processing problem led to great advances in our understanding of both biological and computer vision: both the computer vision and biological vision communities have a Marr medal. The recent successes of deep neural networks in classifying the images that we see and the fMRI images that reveal the activity in our brains during the act of seeing are both intriguing. The links between machine vision systems and biology may at sometimes be weak but the similarity of some of the operations is nonetheless striking [6]. This two-day meeting brought together researchers from the fields of biological and computer vision, robotics, neuroscience, computer science and psychology to discuss the most recent developments in the field. The meeting was divided into four themes: vision for action, visual appearance, vision for recognition and machine learning. Images Computer vision Biological vision Visual perception Processes
4	Novel Methods for Multidimensional Image Segmentation Pichon, Eric 03 November 2005 (has links) Artificial vision is the problem of creating systems capable of processing visual information. A fundamental sub-problem of artificial vision is image segmentation, the problem of detecting a structure from a digital image. Examples of segmentation problems include the detection of a road from an aerial photograph or the determination of the boundaries of the brain's ventricles from medical imagery. The extraction of structures allows for subsequent higher-level cognitive tasks. One of them is shape comparison. For example, if the brain ventricles of a patient are segmented, can their shapes be used for diagnosis? That is to say, do the shapes of the extracted ventricles resemble more those of healthy patients or those of patients suffering from schizophrenia? This thesis deals with the problem of image segmentation and shape comparison in the mathematical framework of partial differential equations. The contribution of this thesis is threefold: 1. A technique for the segmentation of regions is proposed. A cost functional is defined for regions based on a non-parametric functional of the distribution of image intensities inside the region. This cost is constructed to favor regions that are homogeneous. Regions that are optimal with respect to that cost can be determined with limited user interaction. 2. The use of direction information is introduced for the segmentation of open curves and closed surfaces. A cost functional is defined for structures (curves or surfaces) by integrating a local, direction-dependent pattern detector along the structure. Optimal structures, corresponding to the best match with the pattern detector, can be determined using efficient algorithms. 3. A technique for shape comparison based on the Laplace equation is proposed. Given two surfaces, one-to-one correspondences are determined that allow for the characterization of local and global similarity measures. The local differences among shapes (resulting for example from a segmentation step) can be visualized for qualitative evaluation by a human expert. It can also be used for classifying shapes into, for example, normal and pathological classes. Hamilton-Jacobi-Bellman equation Biological vision Laplace equation Direction information Image processing Hamilton-Jacobi equations
5	Aspects of memory and representation in cortical computation Rehn, Martin January 2006 (has links) Denna avhandling i datalogi föreslår modeller för hur vissa beräkningsmässiga uppgifter kan utföras av hjärnbarken. Utgångspunkten är dels kända fakta om hur en area i hjärnbarken är uppbyggd och fungerar, dels etablerade modellklasser inom beräkningsneurobiologi, såsom attraktorminnen och system för gles kodning. Ett neuralt nätverk som producerar en effektiv gles kod i binär mening för sensoriska, särskilt visuella, intryck presenteras. Jag visar att detta nätverk, när det har tränats med naturliga bilder, reproducerar vissa egenskaper (receptiva fält) hos nervceller i lager IV i den primära synbarken och att de koder som det producerar är lämpliga för lagring i associativa minnesmodeller. Vidare visar jag hur ett enkelt autoassociativt minne kan modifieras till att fungera som ett generellt sekvenslärande system genom att utrustas med synapsdynamik. Jag undersöker hur ett abstrakt attraktorminnessystem kan implementeras i en detaljerad modell baserad på data om hjärnbarken. Denna modell kan sedan analyseras med verktyg som simulerar experiment som kan utföras på en riktig hjärnbark. Hypotesen att hjärnbarken till avsevärd del fungerar som ett attraktorminne undersöks och visar sig leda till prediktioner för dess kopplingsstruktur. Jag diskuterar också metodologiska aspekter på beräkningsneurobiologin idag. / In this thesis I take a modular approach to cortical function. I investigate how the cerebral cortex may realise a number of basic computational tasks, within the framework of its generic architecture. I present novel mechanisms for certain assumed computational capabilities of the cerebral cortex, building on the established notions of attractor memory and sparse coding. A sparse binary coding network for generating efficient representations of sensory input is presented. It is demonstrated that this network model well reproduces the simple cell receptive field shapes seen in the primary visual cortex and that its representations are efficient with respect to storage in associative memory. I show how an autoassociative memory, augmented with dynamical synapses, can function as a general sequence learning network. I demonstrate how an abstract attractor memory system may be realised on the microcircuit level -- and how it may be analysed using tools similar to those used experimentally. I outline some predictions from the hypothesis that the macroscopic connectivity of the cortex is optimised for attractor memory function. I also discuss methodological aspects of modelling in computational neuroscience. / QC 20100916 cerebral cortex neural networks attractor memory sequence learning biological vision generative models serial order computational neuroscience dynamical synapses Computer science Datalogi
6	Incorporating complex cells into neural networks for pattern classification Bergstra, James 03 1900 (has links) Dans le domaine des neurosciences computationnelles, l'hypothèse a été émise que le système visuel, depuis la rétine et jusqu'au cortex visuel primaire au moins, ajuste continuellement un modèle probabiliste avec des variables latentes, à son flux de perceptions. Ni le modèle exact, ni la méthode exacte utilisée pour l'ajustement ne sont connus, mais les algorithmes existants qui permettent l'ajustement de tels modèles ont besoin de faire une estimation conditionnelle des variables latentes. Cela nous peut nous aider à comprendre pourquoi le système visuel pourrait ajuster un tel modèle; si le modèle est approprié, ces estimé conditionnels peuvent aussi former une excellente représentation, qui permettent d'analyser le contenu sémantique des images perçues. Le travail présenté ici utilise la performance en classification d'images (discrimination entre des types d'objets communs) comme base pour comparer des modèles du système visuel, et des algorithmes pour ajuster ces modèles (vus comme des densités de probabilité) à des images. Cette thèse (a) montre que des modèles basés sur les cellules complexes de l'aire visuelle V1 généralisent mieux à partir d'exemples d'entraînement étiquetés que les réseaux de neurones conventionnels, dont les unités cachées sont plus semblables aux cellules simples de V1; (b) présente une nouvelle interprétation des modèles du système visuels basés sur des cellules complexes, comme distributions de probabilités, ainsi que de nouveaux algorithmes pour les ajuster à des données; et (c) montre que ces modèles forment des représentations qui sont meilleures pour la classification d'images, après avoir été entraînés comme des modèles de probabilités. Deux innovations techniques additionnelles, qui ont rendu ce travail possible, sont également décrites : un algorithme de recherche aléatoire pour sélectionner des hyper-paramètres, et un compilateur pour des expressions mathématiques matricielles, qui peut optimiser ces expressions pour processeur central (CPU) et graphique (GPU). / Computational neuroscientists have hypothesized that the visual system from the retina to at least primary visual cortex is continuously fitting a latent variable probability model to its stream of perceptions. It is not known exactly which probability model, nor exactly how the fitting takes place, but known algorithms for fitting such models require conditional estimates of the latent variables. This gives us a strong hint as to why the visual system might be fitting such a model; in the right kind of model those conditional estimates can also serve as excellent features for analyzing the semantic content of images perceived. The work presented here uses image classification performance (accurate discrimination between common classes of objects) as a basis for comparing visual system models, and algorithms for fitting those models as probability densities to images. This dissertation (a) finds that models based on visual area V1's complex cells generalize better from labeled training examples than conventional neural networks whose hidden units are more like V1's simple cells, (b) presents novel interpretations for complex-cell-based visual system models as probability distributions and novel algorithms for fitting them to data, and (c) demonstrates that these models form better features for image classification after they are first trained as probability models. Visual system models based on complex cells achieve some of the best results to date on the CIFAR-10 image classification benchmark, and samples from their probability distributions indicate that they have learnt to capture important aspects of natural images. Two auxiliary technical innovations that made this work possible are also described: a random search algorithm for selecting hyper-parameters, and an optimizing compiler for matrix-valued mathematical expressions which can target both CPU and GPU devices. apprentissage machine machine learning aire visuelle V1 visual area v1 selection d'hyper-parametres hyper-parameter selection vision numerique computer vision vision biologique biological vision
7	Incorporating complex cells into neural networks for pattern classification Bergstra, James 03 1900 (has links) No description available. Apprentissage machine Machine learning Aire visuelle V1 Visual area v1 Sélection d'hyper-paramètres Hyper-parameter selection Vision numérique Computer vision Vision biologique Biological vision
8	Évaluation de modèles computationnels de la vision humaine en imagerie par résonance magnétique fonctionnelle / Evaluating Computational Models of Vision with Functional Magnetic Resonance Imaging Eickenberg, Michael 21 September 2015 (has links) L'imagerie par résonance magnétique fonctionnelle (IRMf) permet de mesurer l'activité cérébrale à travers le flux sanguin apporté aux neurones. Dans cette thèse nous évaluons la capacité de modèles biologiquement plausibles et issus de la vision par ordinateur à représenter le contenu d'une image de façon similaire au cerveau. Les principaux modèles de vision évalués sont les réseaux convolutionnels.Les réseaux de neurones profonds ont connu un progrès bouleversant pendant les dernières années dans divers domaines. Des travaux antérieurs ont identifié des similarités entre le traitement de l'information visuelle à la première et dernière couche entre un réseau de neurones et le cerveau. Nous avons généralisé ces similarités en identifiant des régions cérébrales correspondante à chaque étape du réseau de neurones. Le résultat consiste en une progression des niveaux de complexité représentés dans le cerveau qui correspondent à l'architecture connue des aires visuelles: Plus la couche convolutionnelle est profonde, plus abstraits sont ses calculs et plus haut niveau sera la fonction cérébrale qu'elle sait modéliser au mieux. Entre la détection de contours en V1 et la spécificité à l'objet en cortex inférotemporal, fonctions assez bien comprises, nous montrons pour la première fois que les réseaux de neurones convolutionnels de détection d'objet fournissent un outil pour l'étude de toutes les étapes intermédiaires du traitement visuel effectué par le cerveau.Un résultat préliminaire à celui-ci est aussi inclus dans le manuscrit: L'étude de la réponse cérébrale aux textures visuelles et sa modélisation avec les réseaux convolutionnels de scattering.L'autre aspect global de cette thèse sont modèles de “décodage”: Dans la partie précédente, nous prédisions l'activité cérébrale à partir d'un stimulus (modèles dits d’”encodage”). La prédiction du stimulus à partir de l'activité cérébrale est le méchanisme d'inférence inverse et peut servir comme preuve que cette information est présente dans le signal. Le plus souvent, des modèles linéaires généralisés tels que la régression linéaire ou logistique ou les SVM sont utilisés, donnant ainsi accès à une interprétation des coefficients du modèle en tant que carte cérébrale. Leur interprétation visuelle est cependant difficile car le problème linéaire sous-jacent est soit mal posé et mal conditionné ou bien non adéquatement régularisé, résultant en des cartes non-informatives. En supposant une organisation contigüe en espace et parcimonieuse, nous nous appuyons sur la pénalité convexe d'une somme de variation totale et la norme L1 (TV+L1) pour développer une pénalité regroupant un terme d'activation et un terme de dérivée spatiale. Cette pénalité a la propriété de mettre à zéro la plupart des coefficients tout en permettant une variation libre des coefficients dans une zone d'activation, contrairement à TV+L1 qui impose des zones d’activation plates. Cette méthode améliore l'interprétabilité des cartes obtenues dans un schéma de validation croisée basé sur la précision du modèle prédictif.Dans le contexte des modèles d’encodage et décodage nous tâchons à améliorer les prétraitements des données. Nous étudions le comportement du signal IRMf par rapport à la stimulation ponctuelle : la réponse impulsionnelle hémodynamique. Pour générer des cartes d'activation, au lieu d’un modèle linéaire classique qui impose une réponse impulsionnelle canonique fixe, nous utilisons un modèle bilinéaire à réponse hémodynamique variable spatialement mais fixe à travers les événements de stimulation. Nous proposons un algorithme efficace pour l'estimation et montrons un gain en capacité prédictive sur les analyses menées, en encodage et décodage. / Blood-oxygen-level dependent (BOLD) functional magnetic resonance imaging (fMRI) makes it possible to measure brain activity through blood flow to areas with metabolically active neurons. In this thesis we use these measurements to evaluate the capacity of biologically inspired models of vision coming from computer vision to represent image content in a similar way as the human brain. The main vision models used are convolutional networks.Deep neural networks have made unprecedented progress in many fields in recent years. Even strongholds of biological systems such as scene analysis and object detection have been addressed with enormous success. A body of prior work has been able to establish firm links between the first and last layers of deep convolutional nets and brain regions: The first layer and V1 essentially perform edge detection and the last layer as well as inferotemporal cortex permit a linear read-out of object category. In this work we have generalized this correspondence to all intermediate layers of a convolutional net. We found that each layer of a convnet maps to a stage of processing along the ventral stream, following the hierarchy of biological processing: Along the ventral stream we observe a stage-by-stage increase in complexity. Between edge detection and object detection, for the first time we are given a toolbox to study the intermediate processing steps.A preliminary result to this was obtained by studying the response of the visual areas to presentation of visual textures and analysing it using convolutional scattering networks.The other global aspect of this thesis is “decoding” models: In the preceding part, we predicted brain activity from the stimulus presented (this is called “encoding”). Predicting a stimulus from brain activity is the inverse inference mechanism and can be used as an omnibus test for presence of this information in brain signal. Most often generalized linear models such as linear or logistic regression or SVMs are used for this task, giving access to a coefficient vector the same size as a brain sample, which can thus be visualized as a brain map. However, interpretation of these maps is difficult, because the underlying linear system is either ill-defined and ill-conditioned or non-adequately regularized, resulting in non-informative maps. Supposing a sparse and spatially contiguous organization of coefficient maps, we build on the convex penalty consisting of the sum of total variation (TV) seminorm and L1 norm (“TV+L1”) to develop a penalty grouping an activation term with a spatial derivative. This penalty sets most coefficients to zero but permits free smooth variations in active zones, as opposed to TV+L1 which creates flat active zones. This method improves interpretability of brain maps obtained through cross-validation to determine the best hyperparameter.In the context of encoding and decoding models, we also work on improving data preprocessing in order to obtain the best performance. We study the impulse response of the BOLD signal: the hemodynamic response function. To generate activation maps, instead of using a classical linear model with fixed canonical response function, we use a bilinear model with spatially variable hemodynamic response (but fixed across events). We propose an efficient optimization algorithm and show a gain in predictive capacity for encoding and decoding models on different datasets. IRM fonctionnelle Apprentissage statistique Vision par ordinateur Neurosciences Vision (biologique) Optimisation convexe Signal processing Image processing Imagérie médicale Réseaux de neurones artificiels Réseaux de neurones convolutifs Functional MRI Statistical learning/machine learning Computer vision Neuroscience (biological) vision Convex optimization Traitement du signal Traitement d'image Medical imaging Artificial neural networks Convolutional networks

Search results