• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 5
  • 2
  • 1
  • Tagged with
  • 22
  • 11
  • 8
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Shoulder Keypoint-Detection from Object Detection

Kapoor, Prince 22 August 2018 (has links)
This thesis presents detailed observation of different Convolutional Neural Network (CNN) architecture which had assisted Computer Vision researchers to achieve state-of-the-art performance on classification, detection, segmentation and much more to name image analysis challenges. Due to the advent of deep learning, CNN had been used in almost all the computer vision applications and that is why there is utter need to understand the miniature details of these feature extractors and find out their pros and cons of each feature extractor meticulously. In order to perform our experimentation, we decided to explore an object detection task using a particular model architecture which maintains a sweet spot between computational cost and accuracy. The model architecture which we had used is LSTM-Decoder. The model had been experimented with different CNN feature extractor and found their pros and cons in variant scenarios. The results which we had obtained on different datasets elucidates that CNN plays a major role in obtaining higher accuracy and we had also achieved a comparable state-of-the-art accuracy on Pedestrian Detection Dataset. In extension to object detection, we also implemented two different model architectures which find shoulder keypoints. So, One of our idea can be explicated as follows: using the detected annotation from object detection, a small cropped image is generated which would be feed into a small cascade network which was trained for detection of shoulder keypoints. The second strategy is to use the same object detection model and fine tune their weights to predict shoulder keypoints. Currently, we had generated our results for shoulder keypoint detection. However, this idea could be extended to full-body pose Estimation by modifying the cascaded network for pose estimation purpose and this had become an important topic of discussion for the future work of this thesis.
2

Single Camera Autonomous Navigation for Micro Aerial Vehicles

Bowen, Jacob 15 December 2012 (has links)
Micro Aerial Vehicles (MAVs) provide a highly capable, agile platform, ideally suited for intelligence/surveillance/reconnaissance missions, urban search and rescue, and scientific exploration. Critical to the success of these tasks is a system which moves au-tonomously through an unknown, obstacle-strewn, GPS-denied environment. Classical simultaneous localization and mapping (SLAM) approaches rely on large, heavy sensors to generate 3-D information about a MAV’s surroundings, severely limiting its abilities. This motivates a study of Parallel Tracking and Mapping (PTAM), an algorithm requiring only a single camera to provide 3-D data to an autonomous navigation system. Metric properties of 3-D MAV pose estimates are compared with physical measurements to ex-plore tracking accuracy. Additionally, a discrete wavelet transform-based keypoint detec-tor is implemented for a feasibility study on improving map density in low-visual-detail environments. Finally, a system is presented that integrates PTAM, autonomous MAV control, and a human interface for manual control and data logging.
3

Cluster-Based Salient Object Detection Using K-Means Merging and Keypoint Separation with Rectangular Centers

Buck, Robert 01 May 2016 (has links)
The explosion of internet traffic, advent of social media sites such as Facebook and Twitter, and increased availability of digital cameras has saturated life with images and videos. Never before has it been so important to sift quickly through large amounts of digital information. Salient Object Detection (SOD) is a computer vision topic that finds methods to locate important objects in pictures. SOD has proven to be helpful in numerous applications such as image forgery detection and traffic sign recognition. In this thesis, I outline a novel SOD technique to automatically isolate important objects from the background in images.
4

Vyhledávání fotografií podle obsahu / Content Based Photo Search

Dvořák, Pavel January 2014 (has links)
This thesis covers design and practical realization of a tool for quick search in large image databases, containing from tens to hundreds of thousands photos, based on image similarity. The proposed technique uses various methods of descriptor extraction, creation of Bag of Words dictionaries and methods of storing image data in PostgreSQL database. Further, experiments with the implemented software were carried out to evaluate the search time effectivity and scaling possibilities of the design solution.
5

Direction estimation using visual odometry / Uppskattning av riktning med visuell odometri

Masson, Clément January 2015 (has links)
This Master thesis tackles the problem of measuring objects’ directions from a motionlessobservation point. A new method based on a single rotating camera requiring the knowledge ofonly two (or more) landmarks’ direction is proposed. In a first phase, multi-view geometry isused to estimate camera rotations and key elements’ direction from a set of overlapping images.Then in a second phase, the direction of any object can be estimated by resectioning the cameraassociated to a picture showing this object. A detailed description of the algorithmic chain isgiven, along with test results on both synthetic data and real images taken with an infraredcamera. / Detta masterarbete behandlar problemet med att mäta objekts riktningar från en fastobservationspunkt. En ny metod föreslås, baserad på en enda roterande kamera som kräverendast två (eller flera) landmärkens riktningar. I en första fas används multiperspektivgeometri,för att uppskatta kamerarotationer och nyckelelements riktningar utifrån en uppsättningöverlappande bilder. I en andra fas kan sedan riktningen hos vilket objekt som helst uppskattasgenom att kameran, associerad till en bild visande detta objekt, omsektioneras. En detaljeradbeskrivning av den algoritmiska kedjan ges, tillsammans med testresultat av både syntetisk dataoch verkliga bilder tagen med en infraröd kamera.
6

Deep Image Processing with Spatial Adaptation and Boosted Efficiency & Supervision for Accurate Human Keypoint Detection and Movement Dynamics Tracking

Dai, Chao Yang 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / This thesis aims to design and develop the spatial adaptation approach through spatial transformers to improve the accuracy of human keypoint recognition models. We have studied different model types and design choices to gain an accuracy increase over models without spatial transformers and analyzed how spatial transformers increase the accuracy of predictions. A neural network called Widenet has been leveraged as a specialized network for providing the parameters for the spatial transformer. Further, we have evaluated methods to reduce the model parameters, as well as the strategy to enhance the learning supervision for further improving the performance of the model. Our experiments and results have shown that the proposed deep learning framework can effectively detect the human key points, compared with the baseline methods. Also, we have reduced the model size without significantly impacting the performance, and the enhanced supervision has improved the performance. This study is expected to greatly advance the deep learning of human key points and movement dynamics.
7

Detecção de objetos por reconhecimento de grafos-chave / Object detection by keygraph recognition

Hashimoto, Marcelo 27 April 2012 (has links)
Detecção de objetos é um problema clássico em visão computacional, presente em aplicações como vigilância automatizada, análise de imagens médicas e recuperação de informação. Dentre as abordagens existentes na literatura para resolver esse problema, destacam-se métodos baseados em reconhecimento de pontos-chave que podem ser interpretados como diferentes implementações de um mesmo arcabouço. O objetivo desta pesquisa de doutorado é desenvolver e avaliar uma versão generalizada desse arcabouço, na qual reconhecimento de pontos-chave é substituído por reconhecimento de grafos-chave. O potencial da pesquisa reside na riqueza de informação que um grafo pode apresentar antes e depois de ser reconhecido. A dificuldade da pesquisa reside nos problemas que podem ser causados por essa riqueza, como maldição da dimensionalidade e complexidade computacional. Três contribuições serão incluídas na tese: a descrição detalhada de um arcabouço para detecção de objetos baseado em grafos-chave, implementações fiéis que demonstram sua viabilidade e resultados experimentais que demonstram seu desempenho. / Object detection is a classic problem in computer vision, present in applications such as automated surveillance, medical image analysis and information retrieval. Among the existing approaches in the literature to solve this problem, we can highlight methods based on keypoint recognition that can be interpreted as different implementations of a same framework. The objective of this PhD thesis is to develop and evaluate a generalized version of this framework, on which keypoint recognition is replaced by keygraph recognition. The potential of the research resides in the information richness that a graph can present before and after being recognized. The difficulty of the research resides in the problems that can be caused by this richness, such as curse of dimensionality and computational complexity. Three contributions are included in the thesis: the detailed description of a keygraph-based framework for object detection, faithful implementations that demonstrate its feasibility and experimental results that demonstrate its performance.
8

Detecção de objetos por reconhecimento de grafos-chave / Object detection by keygraph recognition

Marcelo Hashimoto 27 April 2012 (has links)
Detecção de objetos é um problema clássico em visão computacional, presente em aplicações como vigilância automatizada, análise de imagens médicas e recuperação de informação. Dentre as abordagens existentes na literatura para resolver esse problema, destacam-se métodos baseados em reconhecimento de pontos-chave que podem ser interpretados como diferentes implementações de um mesmo arcabouço. O objetivo desta pesquisa de doutorado é desenvolver e avaliar uma versão generalizada desse arcabouço, na qual reconhecimento de pontos-chave é substituído por reconhecimento de grafos-chave. O potencial da pesquisa reside na riqueza de informação que um grafo pode apresentar antes e depois de ser reconhecido. A dificuldade da pesquisa reside nos problemas que podem ser causados por essa riqueza, como maldição da dimensionalidade e complexidade computacional. Três contribuições serão incluídas na tese: a descrição detalhada de um arcabouço para detecção de objetos baseado em grafos-chave, implementações fiéis que demonstram sua viabilidade e resultados experimentais que demonstram seu desempenho. / Object detection is a classic problem in computer vision, present in applications such as automated surveillance, medical image analysis and information retrieval. Among the existing approaches in the literature to solve this problem, we can highlight methods based on keypoint recognition that can be interpreted as different implementations of a same framework. The objective of this PhD thesis is to develop and evaluate a generalized version of this framework, on which keypoint recognition is replaced by keygraph recognition. The potential of the research resides in the information richness that a graph can present before and after being recognized. The difficulty of the research resides in the problems that can be caused by this richness, such as curse of dimensionality and computational complexity. Three contributions are included in the thesis: the detailed description of a keygraph-based framework for object detection, faithful implementations that demonstrate its feasibility and experimental results that demonstrate its performance.
9

Interpretable Fine-Grained Visual Categorization

Guo, Pei 16 June 2021 (has links)
Not all categories are created equal in object recognition. Fine-grained visual categorization (FGVC) is a branch of visual object recognition that aims to distinguish subordinate categories within a basic-level category. Examples include classifying an image of a bird into specific species like "Western Gull" or "California Gull". Such subordinate categories exhibit characteristics like small inter-category variation and large intra-class variation, making distinguishing them extremely difficult. To address such challenges, an algorithm should be able to focus on object parts and be invariant to object pose. Like many other computer vision tasks, FGVC has witnessed phenomenal advancement following the resurgence of deep neural networks. However, the proposed deep models are usually treated as black boxes. Network interpretation and understanding aims to unveil the features learned by neural networks and explain the reason behind network decisions. It is not only a necessary component for building trust between humans and algorithms, but also an essential step towards continuous improvement in this field. This dissertation is a collection of papers that contribute to FGVC and neural network interpretation and understanding. Our first contribution is an algorithm named Pose and Appearance Integration for Recognizing Subcategories (PAIRS) which performs pose estimation and generates a unified object representation as the concatenation of pose-aligned region features. As the second contribution, we propose the task of semantic network interpretation. For filter interpretation, we represent the concepts a filter detects using an attribute probability density function. We propose the task of semantic attribution using textual summarization that generates an explanatory sentence consisting of the most important visual attributes for decision-making, as found by a general Bayesian inference algorithm. Pooling has been a key component in convolutional neural networks and is of special interest in FGVC. Our third contribution is an empirical and experimental study towards a thorough yet intuitive understanding and extensive benchmark of popular pooling approaches. Our fourth contribution is a novel LMPNet for weakly-supervised keypoint discovery. A novel leaky max pooling layer is proposed to explicitly encourages sparse feature maps to be learned. A learnable clustering layer is proposed to group the keypoint proposals into final keypoint predictions. 2020 marks the 10th year since the beginning of fine-grained visual categorization. It is of great importance to summarize the representative works in this domain. Our last contribution is a comprehensive survey of FGVC containing nearly 200 relevant papers that cover 7 common themes.
10

Adaptive registration using 2D and 3D features for indoor scene reconstruction. / Registro adaptativo usando características 2D e 3D para reconstrução de cenas em ambientes internos.

Perafán Villota, Juan Carlos 27 October 2016 (has links)
Pairwise alignment between point clouds is an important task in building 3D maps of indoor environments with partial information. The combination of 2D local features with depth information provided by RGB-D cameras are often used to improve such alignment. However, under varying lighting or low visual texture, indoor pairwise frame registration with sparse 2D local features is not a particularly robust method. In these conditions, features are hard to detect, thus leading to misalignment between consecutive pairs of frames. The use of 3D local features can be a solution as such features come from the 3D points themselves and are resistant to variations in visual texture and illumination. Because varying conditions in real indoor scenes are unavoidable, we propose a new framework to improve the pairwise frame alignment using an adaptive combination of sparse 2D and 3D features based on both the levels of geometric structure and visual texture contained in each scene. Experiments with datasets including unrestricted RGB-D camera motion and natural changes in illumination show that the proposed framework convincingly outperforms methods using 2D or 3D features separately, as reflected in better level of alignment accuracy. / O alinhamento entre pares de nuvens de pontos é uma tarefa importante na construção de mapas de ambientes em 3D. A combinação de características locais 2D com informação de profundidade fornecida por câmeras RGB-D são frequentemente utilizadas para melhorar tais alinhamentos. No entanto, em ambientes internos com baixa iluminação ou pouca textura visual o método usando somente características locais 2D não é particularmente robusto. Nessas condições, as características 2D são difíceis de serem detectadas, conduzindo a um desalinhamento entre pares de quadros consecutivos. A utilização de características 3D locais pode ser uma solução uma vez que tais características são extraídas diretamente de pontos 3D e são resistentes a variações na textura visual e na iluminação. Como situações de variações em cenas reais em ambientes internos são inevitáveis, essa tese apresenta um novo sistema desenvolvido com o objetivo de melhorar o alinhamento entre pares de quadros usando uma combinação adaptativa de características esparsas 2D e 3D. Tal combinação está baseada nos níveis de estrutura geométrica e de textura visual contidos em cada cena. Esse sistema foi testado com conjuntos de dados RGB-D, incluindo vídeos com movimentos irrestritos da câmera e mudanças naturais na iluminação. Os resultados experimentais mostram que a nossa proposta supera aqueles métodos que usam características 2D ou 3D separadamente, obtendo uma melhora da precisão no alinhamento de cenas em ambientes internos reais.

Page generated in 0.1239 seconds