Spelling suggestions: "subject:"abject recognition"" "subject:"6bject recognition""
91 |
Selectivity of Local Field Potentials in Macaque Inferior Temporal CortexKreiman, Gabriel, Hung, Chou, Poggio, Tomaso, DiCarlo, James 21 September 2004 (has links)
While single neurons in inferior temporal (IT) cortex show differential responses to distinct complex stimuli, little is known about the responses of populations of neurons in IT. We recorded single electrode data, including multi-unit activity (MUA) and local field potentials (LFP), from 618 sites in the inferior temporal cortex of macaque monkeys while the animals passively viewed 78 different pictures of complex stimuli. The LFPs were obtained by low-pass filtering the extracellular electrophysiological signal with a corner frequency of 300 Hz. As reported previously, we observed that spike counts from MUA showed selectivity for some of the pictures. Strikingly, the LFP data, which is thought to constitute an average over large numbers of neurons, also showed significantly selective responses. The LFP responses were less selective than the MUA responses both in terms of the proportion of selective sites as well as in the selectivity of each site. We observed that there was only little overlap between the selectivity of MUA and LFP recordings from the same electrode. To assess the spatial organization of selective responses, we compared the selectivity of nearby sites recorded along the same penetration and sites recorded from different penetrations. We observed that MUA selectivity was correlated on spatial scales up to 800 m while the LFP selectivity was correlated over a larger spatial extent, with significant correlations between sites separated by several mm. Our data support the idea that there is some topographical arrangement to the organization of selectivity in inferior temporal cortex and that this organization may be relevant for the representation of object identity in IT.
|
92 |
Face processing in humans is compatible with a simple shape-based model of visionRiesenhuber, Jarudi, Gilad, Sinha 05 March 2004 (has links)
Understanding how the human visual system recognizes objects is one of the key challenges in neuroscience. Inspired by a large body of physiological evidence (Felleman and Van Essen, 1991; Hubel and Wiesel, 1962; Livingstone and Hubel, 1988; Tso et al., 2001; Zeki, 1993), a general class of recognition models has emerged which is based on a hierarchical organization of visual processing, with succeeding stages being sensitive to image features of increasing complexity (Hummel and Biederman, 1992; Riesenhuber and Poggio, 1999; Selfridge, 1959). However, these models appear to be incompatible with some well-known psychophysical results. Prominent among these are experiments investigating recognition impairments caused by vertical inversion of images, especially those of faces. It has been reported that faces that differ Âfeaturally are much easier to distinguish when inverted than those that differ Âconfigurally (Freire et al., 2000; Le Grand et al., 2001; Mondloch et al., 2002)  a finding that is difficult to reconcile with the aforementioned models. Here we show that after controlling for subjects expectations, there is no difference between Âfeaturally and Âconfigurally transformed faces in terms of inversion effect. This result reinforces the plausibility of simple hierarchical models of object representation and recognition in cortex.
|
93 |
Shape Representation in V4: Investigating Position-Specific Tuning for Boundary Conformation with the Standard Model of Object RecognitionCadieu, Charles, Kouh, Minjoon, Riesenhuber, Maximilian, Poggio, Tomaso 12 November 2004 (has links)
The computational processes in the intermediate stages of the ventral pathway responsible for visual object recognition are not well understood. A recent physiological study by A. Pasupathy and C. Connor in intermediate area V4 using contour stimuli, proposes that a population of V4 neurons display bjectcentered,position-specific curvature tuning [18]. The Âstandard model of object recognition, a recently developed model [23] to account for recognition properties of IT cells (extending classical suggestions by Hubel, Wiesel and others [9, 10, 19]), is used here to model the response of the V4 cells described in [18]. Our results show that a feedforward, network level mechanism can exhibit selectivity and invariance properties that correspond to the responses of the V4 cells described in [18]. These results suggest howobject-centered, position-specific curvature tuning of V4 cells may arise from combinations of complex V1 cell responses. Furthermore, the model makes predictions about the responses of the same V4 cells studied by Pasupathy and Connor to novel gray level patterns, such as gratings and natural images. Thesepredictions suggest specific experiments to further explore shape representation in V4.
|
94 |
A new biologically motivated framework for robust object recognitionSerre, Thomas, Wolf, Lior, Poggio, Tomaso 14 November 2004 (has links)
In this paper, we introduce a novel set of features for robust object recognition, which exhibits outstanding performances on a variety ofobject categories while being capable of learning from only a fewtraining examples. Each element of this set is a complex featureobtained by combining position- and scale-tolerant edge-detectors overneighboring positions and multiple orientations.Our system - motivated by a quantitative model of visual cortex -outperforms state-of-the-art systems on a variety of object imagedatasets from different groups. We also show that our system is ableto learn from very few examples with no prior category knowledge. Thesuccess of the approach is also a suggestive plausibility proof for aclass of feed-forward models of object recognition in cortex. Finally,we conjecture the existence of a universal overcompletedictionary of features that could handle the recognition of all objectcategories.
|
95 |
Efficient Image Matching with Distributions of Local Invariant FeaturesGrauman, Kristen, Darrell, Trevor 22 November 2004 (has links)
Sets of local features that are invariant to common image transformations are an effective representation to use when comparing images; current methods typically judge feature sets' similarity via a voting scheme (which ignores co-occurrence statistics) or by comparing histograms over a set of prototypes (which must be found by clustering). We present a method for efficiently comparing images based on their discrete distributions (bags) of distinctive local invariant features, without clustering descriptors. Similarity between images is measured with an approximation of the Earth Mover's Distance (EMD), which quickly computes the minimal-cost correspondence between two bags of features. Each image's feature distribution is mapped into a normed space with a low-distortion embedding of EMD. Examples most similar to a novel query image are retrieved in time sublinear in the number of examples via approximate nearest neighbor search in the embedded space. We also show how the feature representation may be extended to encode the distribution of geometric constraints between the invariant features appearing in each image.We evaluate our technique with scene recognition and texture classification tasks.
|
96 |
Ultra-fast Object Recognition from Few SpikesHung, Chou, Kreiman, Gabriel, Poggio, Tomaso, DiCarlo, James J. 06 July 2005 (has links)
Understanding the complex brain computations leading to object recognition requires quantitatively characterizing the information represented in inferior temporal cortex (IT), the highest stage of the primate visual stream. A read-out technique based on a trainable classifier is used to characterize the neural coding of selectivity and invariance at the population level. The activity of very small populations of independently recorded IT neurons (~100 randomly selected cells) over very short time intervals (as small as 12.5 ms) contains surprisingly accurate and robust information about both object Âidentity and ÂcategoryÂ, which is furthermore highly invariant to object position and scale. Significantly, selectivity and invariance are present even for novel objects, indicating that these properties arise from the intrinsic circuitry and do not require object-specific learning. Within the limits of the technique, there is no detectable difference in the latency or temporal resolution of the IT information supporting so-called Âcategorization (a.k. basic level) and Âidentification (a.k. subordinate level) tasks. Furthermore, where information, in particular information about stimulus location and scale, can also be read-out from the same small population of IT neurons. These results show how it is possible to decode invariant object information rapidly, accurately and robustly from a small population in IT and provide insights into the nature of the neural code for different kinds of object-related information.
|
97 |
Boosting a Biologically Inspired Local Descriptor for Geometry-free Face and Full Multi-view 3D Object RecognitionYokono, Jerry Jun, Poggio, Tomaso 07 July 2005 (has links)
Object recognition systems relying on local descriptors are increasingly used because of their perceived robustness with respect to occlusions and to global geometrical deformations. Descriptors of this type -- based on a set of oriented Gaussian derivative filters -- are used in our recognition system. In this paper, we explore a multi-view 3D object recognition system that does not use explicit geometrical information. The basic idea is to find discriminant features to describe an object across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on face images with excellent recognition rate.
|
98 |
Towards the Design of Neural Network Framework for Object Recognition and Target Region Refining for Smart Transportation SystemsZhao, Yiheng 13 August 2018 (has links)
Object recognition systems have significant influences on modern life. Face, iris and finger point recognition applications are commonly applied for the security purposes; ASR (Automatic Speech Recognition) is commonly implemented on speech subtitle generation for various videos and audios, such as YouTube; HWR (Handwriting Recognition) systems are essential on the post office for cheque and postcode detection; ADAS (Advanced Driver Assistance System) are well applied to improve drivers’, passages’ and pedestrians’ safety. Object recognition techniques are crucial and valuable for academia, commerce and industry.
Accuracy and efficiency are two important standards to evaluate the performance of recognition techniques. Accuracy includes how many objects can be indicated in real scene and how many of them can be correctly classified. Efficiency means speed for system training and sample testing. Traditional object detecting methods, such as HOG (Histogram of orientated Gradient) feature detector combining with SVM (Support Vector Machine) classifier, cannot compete with frameworks of neural networks in both efficiency and accuracy. Since neural network has better performance and potential for improvement, it is worth to gain insight into this field to design more advanced recognition systems.
In this thesis, we list and analyze sophisticated techniques and frameworks for object recognition. To understand the mathematical theory for network design, state-of-the-art networks in ILSVRC (ImageNET Large Scale Visual Recognition Challenge) are studied. Based on analysis and the concept of edge detectors, a simple CNN (Convolutional Neural Network) structure is designed as a trail to explore the possibility to utilize the network of high width and low depth for region proposal selection, object recognition and target region refining. We adopt Le-Net as the template, taking advantage of multi-kernels of GoogLe-Net.
We made experiments to test the performance of this simple structure of the vehicle and face through ImageNet dataset. The accuracy for the single object detection is 81% in average and for plural object detection is 73.5%. We refined networks through many aspects to reach the final accuracy 95% for single object detection and 89% for plural object detection.
|
99 |
On-the-fly visual category search in web-scale image collectionsChatfield, Ken January 2014 (has links)
This thesis tackles the problem of large-scale visual search for categories within large collections of images. Given a textual description of a visual category, such as 'car' or 'person', the objective is to retrieve images containing that category from the corpus quickly and accurately, and without the need for auxiliary meta-data or, crucially and in contrast to previous approaches, expensive pre-training. The general approach to identifying different visual categories within a dataset is to train classifiers over features extracted from a set of training images. The performance of such classifiers relies heavily on sufficiently discriminative image representations, and many methods have been proposed which involve the aggregating of local appearance features into rich bag-of-words encodings. We begin by conducting a comprehensive evaluation of the latest such encodings, identifying best-of-breed practices for training powerful visual models using these representations. We also contrast these methods with the latest breed of Convolutional Network (ConvNet) based features, thus developing a state-of-the-art architecture for large-scale image classification. Following this, we explore how a standard classification pipeline can be adapted for use in a real-time setting. One of the major issues, particularly with bag-of-words based methods, is the high dimensionality of the encodings, which causes ranking over large datasets to be prohibitively expensive. We therefore assess different methods for compressing such features, and further propose a novel cascade approach to ranking which both reduces ranking time and improves retrieval performance. Finally, we explore the problem of training visual models on-the-fly, making use of visual data dynamically collected from the web to train classifiers on demand. On this basis, we develop a novel GPU architecture for on-the-fly visual category search which is capable of retrieving previously unknown categories over unannonated datasets of millions of images in just a few seconds.
|
100 |
Engagement Recognition in an E-learning Environment Using Convolutional Neural NetworkJiang, Zeting, Zhu, Kaicheng January 2021 (has links)
Background. Under the current situation, distance education has rapidly become popular among students and teachers. This educational situation has changed the traditional way of teaching in the classroom. Under this kind of circumstance, students will be required to learn independently. But at the same time, it also brings some drawbacks, and teachers cannot obtain the feedback of students’ engagement in real-time. This thesis explores the feasibility of applying a lightweight model to recognize student engagement and the practicality of the model in a distance education environment. Objectives. This thesis aims to develop and apply a lightweight model based on Convolutional Neural Network(CNN) with acceptable performance to recognize the engagement of students in the environment of distance learning. Evaluate and compare the optimized model with selected original and other models in different performance metrics. Methods. This thesis uses experiments and literature review as research methods. The literature review is conducted to select effective CNN-based models for engagement recognition and feasible strategies for optimizing chosen models. These selected and optimized models are trained, tested, evaluated and compared as independent variables in the experiments. The performance of different models is used as the dependent variable. Results. Based on the literature review results, ShuffleNet v2 is selected as the most suitable CNN architecture for solving the task of engagement recognition. Inception v3 and ResNet are used as the classic CNN architecture for comparison. The attention mechanism and replace activation function are used as optimization methods for ShuffleNet v2. The pre-experiment results show that ShuffleNet v2 using the Leaky ReLU function has the highest accuracy compared with other activation functions. The experimental results show that the optimized model performs better in engagement recognition tasks than the baseline ShuffleNet v2 model, ResNet v2 and Inception v3 models. Conclusions. Through the analysis of the experiment results, the optimized ShuffleNet v2 has the best performance and is the most suitable model for real-world applications and deployments on mobile platforms.
|
Page generated in 0.1102 seconds