1 |
View-Based Strategies for 3D Object RecognitionSinha, Pawan, Poggio, Tomaso 21 April 1995 (has links)
A persistent issue of debate in the area of 3D object recognition concerns the nature of the experientially acquired object models in the primate visual system. One prominent proposal in this regard has expounded the use of object centered models, such as representations of the objects' 3D structures in a coordinate frame independent of the viewing parameters [Marr and Nishihara, 1978]. In contrast to this is another proposal which suggests that the viewing parameters encountered during the learning phase might be inextricably linked to subsequent performance on a recognition task [Tarr and Pinker, 1989; Poggio and Edelman, 1990]. The 'object model', according to this idea, is simply a collection of the sample views encountered during training. Given that object centered recognition strategies have the attractive feature of leading to viewpoint independence, they have garnered much of the research effort in the field of computational vision. Furthermore, since human recognition performance seems remarkably robust in the face of imaging variations [Ellis et al., 1989], it has often been implicitly assumed that the visual system employs an object centered strategy. In the present study we examine this assumption more closely. Our experimental results with a class of novel 3D structures strongly suggest the use of a view-based strategy by the human visual system even when it has the opportunity of constructing and using object-centered models. In fact, for our chosen class of objects, the results seem to support a stronger claim: 3D object recognition is 2D view-based.
|
2 |
Contributions to 3D data processing and social roboticsEscalona, Félix 30 September 2021 (has links)
In this thesis, a study of artificial intelligence applied to 3D data and social robotics is carried out. The first part of the present document is dedicated to 3D object recognition. Object recognition consists on the automatic detection and categorisation of the objects that appear in a scene. This capability is an important need for social robots, as it allows them to understand and interact with their environment. Image-based methods have been largely studied with great results, but they only rely on visual features and can confuse different objects with similar appearances (picture with the object depicted in it), so 3D data can help to improve these systems using topological features. For this part, we present different novel techniques that use pure 3D data. The second part of the thesis is about the mapping of the environment. Mapping of the environment consists on constructing a map that can be used by a robot to locate itself. This capability enables them to perform a more elaborated navigation strategy, which is tremendously usable by a social robot to interact with the different rooms of a house and its objects. In this section, we will explore 2D and 3D maps and their refinement with object recognition. Finally, the third part of this work is about social robotics. Social robotics is focused on serving people in a caring interaction rather than to perform a mechanical task. Previous sections are related to two main capabilities of a social robot, and this final section contains a survey about this kind of robots and other projects that explore other aspects of them.
|
3 |
Feature extraction from 3D point clouds / Extração de atributos robustos a partir de nuvens de pontos 3DPrzewodowski Filho, Carlos André Braile 13 March 2018 (has links)
Computer vision is a research field in which images are the main object of study. One of its category of problems is shape description. Object classification is one important example of applications using shape descriptors. Usually, these processes were performed on 2D images. With the large-scale development of new technologies and the affordable price of equipment that generates 3D images, computer vision has adapted to this new scenario, expanding the classic 2D methods to 3D. However, it is important to highlight that 2D methods are mostly dependent on the variation of illumination and color, while 3D sensors provide depth, structure/3D shape and topological information beyond color. Thus, different methods of shape descriptors and robust attributes extraction were studied, from which new attribute extraction methods have been proposed and described based on 3D data. The results obtained from well known public datasets have demonstrated their efficiency and that they compete with other state-of-the-art methods in this area: the RPHSD (a method proposed in this dissertation), achieved 85:4% of accuracy on the University of Washington RGB-D dataset, being the second best accuracy on this dataset; the COMSD (another proposed method) has achieved 82:3% of accuracy, standing at the seventh position in the rank; and the CNSD (another proposed method) at the ninth position. Also, the RPHSD and COMSD methods have relatively small processing complexity, so they achieve high accuracy with low computing time. / Visão computacional é uma área de pesquisa em que as imagens são o principal objeto de estudo. Um dos problemas abordados é o da descrição de formatos (em inglês, shapes). Classificação de objetos é um importante exemplo de aplicação que usa descritores de shapes. Classicamente, esses processos eram realizados em imagens 2D. Com o desenvolvimento em larga escala de novas tecnologias e o barateamento dos equipamentos que geram imagens 3D, a visão computacional se adaptou para este novo cenário, expandindo os métodos 2D clássicos para 3D. Entretanto, estes métodos são, majoritariamente, dependentes da variação de iluminação e de cor, enquanto os sensores 3D fornecem informações de profundidade, shape 3D e topologia, além da cor. Assim, foram estudados diferentes métodos de classificação de objetos e extração de atributos robustos, onde a partir destes são propostos e descritos novos métodos de extração de atributos a partir de dados 3D. Os resultados obtidos utilizando bases de dados 3D públicas conhecidas demonstraram a eficiência dos métodos propóstos e que os mesmos competem com outros métodos no estado-da-arte: o RPHSD (um dos métodos propostos) atingiu 85:4% de acurácia, sendo a segunda maior acurácia neste banco de dados; o COMSD (outro método proposto) atingiu 82:3% de acurácia, se posicionando na sétima posição do ranking; e o CNSD (outro método proposto) em nono lugar. Além disso, os métodos RPHSD têm uma complexidade de processamento relativamente baixa. Assim, eles atingem uma alta acurácia com um pequeno tempo de processamento.
|
4 |
Construction of Appearance Manifold with Embedded View-Dependent Covariance Matrix for 3D Object RecognitionMURASE, Hiroshi, IDE, Ichiro, TAKAHASHI, Tomokazu, Lina 01 April 2008 (has links)
No description available.
|
5 |
3d Object Recognition By Geometric Hashing For Robotics ApplicationsHozatli, Aykut 01 February 2009 (has links) (PDF)
The main aim of 3D Object recognition is to recognize objects under translation
and rotation. Geometric Hashing is one of the methods which represents a
rotation and translation invariant approach and provides indexing of structural
features of the objects in an efficient way. In this thesis, Geometric Hashing is
used to store the geometric relationship between discriminative surface
properties which are based on surface curvature. In this thesis surface is
represented by shape index and splash where shape index defines particular
shaped surfaces and splash introduces topological information. The method is
tested on 3D object databases and compared with other methods in the
literature.
|
6 |
3d Geometric Hashing Using Transform Invariant FeaturesEskizara, Omer 01 April 2009 (has links) (PDF)
3D object recognition is performed by using geometric hashing where transformation and scale invariant 3D surface features are utilized. 3D features are extracted from object surfaces after a scale space search where size of each feature is also estimated.
Scale space is constructed based on orientation invariant surface curvature values which classify each surface point' / s shape. Extracted features are grouped into triplets and orientation invariant descriptors are defined for each triplet. Each pose of each object is indexed in a hash table using these triplets. For scale invariance matching, cosine similarity is applied for scale variant triple variables. Tests were performed on Stuttgart database where 66 poses of 42 objects are stored in the hash table during training and 258 poses of 42 objects are used during testing. %90.97 recognition rate is achieved.
|
7 |
3d Object Recognition Using Scale Space Of CurvaturesAkagunduz, Erdem 01 January 2011 (has links) (PDF)
In this thesis, a generic, scale and resolution invariant method to extract 3D features from 3D surfaces, is proposed. Features are extracted with their scale (metric size and resolution) from range images using scale-space of 3D surface curvatures. Different from previous scale-space approaches / connected components within the classified curvature scale-space are extracted as features. Furthermore, scales of features are extracted invariant of the metric size or the sampling of the range images. Geometric hashing is used for object recognition where scaled, occluded and both scaled and occluded versions of range images from a 3D object database are tested. The experimental results under varying scale and occlusion are compared with SIFT in terms of recognition capabilities. In addition, to emphasize the importance of using scale space of curvatures, the comparative recognition results obtained with single scale features are also presented.
|
8 |
Feature extraction from 3D point clouds / Extração de atributos robustos a partir de nuvens de pontos 3DCarlos André Braile Przewodowski Filho 13 March 2018 (has links)
Computer vision is a research field in which images are the main object of study. One of its category of problems is shape description. Object classification is one important example of applications using shape descriptors. Usually, these processes were performed on 2D images. With the large-scale development of new technologies and the affordable price of equipment that generates 3D images, computer vision has adapted to this new scenario, expanding the classic 2D methods to 3D. However, it is important to highlight that 2D methods are mostly dependent on the variation of illumination and color, while 3D sensors provide depth, structure/3D shape and topological information beyond color. Thus, different methods of shape descriptors and robust attributes extraction were studied, from which new attribute extraction methods have been proposed and described based on 3D data. The results obtained from well known public datasets have demonstrated their efficiency and that they compete with other state-of-the-art methods in this area: the RPHSD (a method proposed in this dissertation), achieved 85:4% of accuracy on the University of Washington RGB-D dataset, being the second best accuracy on this dataset; the COMSD (another proposed method) has achieved 82:3% of accuracy, standing at the seventh position in the rank; and the CNSD (another proposed method) at the ninth position. Also, the RPHSD and COMSD methods have relatively small processing complexity, so they achieve high accuracy with low computing time. / Visão computacional é uma área de pesquisa em que as imagens são o principal objeto de estudo. Um dos problemas abordados é o da descrição de formatos (em inglês, shapes). Classificação de objetos é um importante exemplo de aplicação que usa descritores de shapes. Classicamente, esses processos eram realizados em imagens 2D. Com o desenvolvimento em larga escala de novas tecnologias e o barateamento dos equipamentos que geram imagens 3D, a visão computacional se adaptou para este novo cenário, expandindo os métodos 2D clássicos para 3D. Entretanto, estes métodos são, majoritariamente, dependentes da variação de iluminação e de cor, enquanto os sensores 3D fornecem informações de profundidade, shape 3D e topologia, além da cor. Assim, foram estudados diferentes métodos de classificação de objetos e extração de atributos robustos, onde a partir destes são propostos e descritos novos métodos de extração de atributos a partir de dados 3D. Os resultados obtidos utilizando bases de dados 3D públicas conhecidas demonstraram a eficiência dos métodos propóstos e que os mesmos competem com outros métodos no estado-da-arte: o RPHSD (um dos métodos propostos) atingiu 85:4% de acurácia, sendo a segunda maior acurácia neste banco de dados; o COMSD (outro método proposto) atingiu 82:3% de acurácia, se posicionando na sétima posição do ranking; e o CNSD (outro método proposto) em nono lugar. Além disso, os métodos RPHSD têm uma complexidade de processamento relativamente baixa. Assim, eles atingem uma alta acurácia com um pequeno tempo de processamento.
|
9 |
Hand gesture recognition using sEMG and deep learningNasri, Nadia 17 June 2021 (has links)
In this thesis, a study of two blooming fields in the artificial intelligence topic is carried out. The first part of the present document is about 3D object recognition methods. Object recognition in general is about providing the ability to understand what objects appears in the input data of an intelligent system. Any robot, from industrial robots to social robots, could benefit of such capability to improve its performance and carry out high level tasks. In fact, this topic has been largely studied and some object recognition methods present in the state of the art outperform humans in terms of accuracy. Nonetheless, these methods are image-based, namely, they focus in recognizing visual features. This could be a problem in some contexts as there exist objects that look alike some other, different objects. For instance, a social robot that recognizes a face in a picture, or an intelligent car that recognizes a pedestrian in a billboard. A potential solution for this issue would be involving tridimensional data so that the systems would not focus on visual features but topological features. Thus, in this thesis, a study of 3D object recognition methods is carried out. The approaches proposed in this document, which take advantage of deep learning methods, take as an input point clouds and are able to provide the correct category. We evaluated the proposals with a range of public challenges, datasets and real life data with high success. The second part of the thesis is about hand pose estimation. This is also an interesting topic that focuses in providing the hand's kinematics. A range of systems, from human computer interaction and virtual reality to social robots could benefit of such capability. For instance to interface a computer and control it with seamless hand gestures or to interact with a social robot that is able to understand human non-verbal communication methods. Thus, in the present document, hand pose estimation approaches are proposed. It is worth noting that the proposals take as an input color images and are able to provide 2D and 3D hand pose in the image plane and euclidean coordinate frames. Specifically, the hand poses are encoded in a collection of points that represents the joints in a hand, so that they can be easily reconstructed in the full hand pose. The methods are evaluated on custom and public datasets, and integrated with a robotic hand teleoperation application with great success.
|
10 |
3D Object Representation and Recognition Based on Biologically Inspired Combined Use of Visual and Tactile DataRouhafzay, Ghazal 13 May 2021 (has links)
Recent research makes use of biologically inspired computation and artificial intelligence as efficient means to solve real-world problems. Humans show a significant performance in extracting and interpreting visual information. In the cases where visual data is not available, or, for example, if it fails to provide comprehensive information due to occlusions, tactile exploration assists in the interpretation and better understanding of the environment. This cooperation between human senses can serve as an inspiration to embed a higher level of intelligence in computational models.
In the context of this research, in the first step, computational models of visual attention are explored to determine salient regions on the surface of objects. Two different approaches are proposed. The first approach takes advantage of a series of contributing features in guiding human visual attention, namely color, contrast, curvature, edge, entropy, intensity, orientation, and symmetry are efficiently integrated to identify salient features on the surface of 3D objects. This model of visual attention also learns to adaptively weight each feature based on ground-truth data to ensure a better compatibility with human visual exploration capabilities. The second approach uses a deep Convolutional Neural Network (CNN) for feature extraction from images collected from 3D objects and formulates saliency as a fusion map of regions where the CNN looks at, while classifying the object based on their geometrical and semantic characteristics. The main difference between the outcomes of the two algorithms is that the first approach results in saliencies spread over the surface of the objects while the second approach highlights one or two regions with concentrated saliency. Therefore, the first approach is an appropriate simulation of visual exploration of objects, while the second approach successfully simulates the eye fixation locations on objects.
In the second step, the first computational model of visual attention is used to determine scattered salient points on the surface of objects based on which simplified versions of 3D object models preserving the important visual characteristics of objects are constructed. Subsequently, the thesis focuses on the topic of tactile object recognition, leveraging the proposed model of visual attention. Beyond the sensor technologies which are instrumental in ensuring data quality, biological models can also assist in guiding the placement of sensors and support various selective data sampling strategies that allow exploring an object’s surface faster. Therefore, the possibility to guide the acquisition of tactile data based on the identified visually salient features is tested and validated in this research. Different object exploration and data processing approaches were used to identify the most promising solution.
Our experiments confirm the effectiveness of computational models of visual attention as a guide for data selection for both simplifying 3D representation of objects as well as enhancing tactile object recognition. In particular, the current research demonstrates that: (1) the simplified representation of objects by preserving visually salient characteristics shows a better compatibility with human visual capabilities compared to uniformly simplified models, and (2) tactile data acquired based on salient visual features are more informative about the objects’ characteristics and can be employed in tactile object manipulation and recognition scenarios.
In the last section, the thesis addresses the issue of transfer of learning from vision to touch. Inspired from biological studies that attest similarities between the processing of visual and tactile stimuli in human brain, the thesis studies the possibility of transfer of learning from vision to touch using deep learning architectures and proposes a hybrid CNN that handles both visual and tactile object recognition.
|
Page generated in 0.1013 seconds