Global ETD Search

1	Novos descritores de textura para localização e identificação de objetos em imagens usando Bag-of-Features / New texture descriptors for locating and identifying objects in images using Bag-of-Features Ferraz, Carolina Toledo 02 September 2016 (has links) Descritores de características locais de imagens utilizados na representação de objetos têm se tornado muito populares nos últimos anos. Tais descritores têm a capacidade de caracterizar o conteúdo da imagem em dados compactos e discriminativos. As informações extraídas dos descritores são representadas por meio de vetores de características e são utilizados em várias aplicações, tais como reconhecimento de faces, cenas complexas e texturas. Neste trabalho foi explorada a análise e modelagem de descritores locais para caracterização de imagens invariantes a escala, rotação, iluminação e mudanças de ponto de vista. Esta tese apresenta três novos descritores locais que contribuem com o avanço das pesquisas atuais na área de visão computacional, desenvolvendo novos modelos para a caracterização de imagens e reconhecimento de imagens. A primeira contribuição desta tese é referente ao desenvolvimento de um descritor de imagens baseado no mapeamento das diferenças de nível de cinza, chamado Center-Symmetric Local Mapped Pattern (CS-LMP). O descritor proposto mostrou-se robusto a mudanças de escala, rotação, iluminação e mudanças parciais de ponto de vista, e foi comparado aos descritores Center-Symmetric Local Binary Pattern (CS-LBP) e Scale-Invariant Feature Transform (SIFT). A segunda contribuição é uma modificação do descritor CS-LMP, e foi denominada Modified Center-Symmetric Local Mapped Pattern (MCS-LMP). O descritor inclui o cálculo do pixel central na modelagem matemática, caracterizando melhor o conteúdo da mesma. O descritor proposto apresentou resultados superiores aos descritores CS-LMP, SIFT e LIOP na avaliação de reconhecimento de cenas complexas. A terceira contribuição é o desenvolvimento de um descritor de imagens chamado Mean-Local Mapped Pattern (M-LMP) que captura de modo mais fiel pequenas transições dos pixels na imagem, resultando em um número maior de \"matches\" corretos do que os descritores CS-LBP e SIFT. Além disso, foram realizados experimentos para classificação de objetos usando as base de imagens Caltech e Pascal VOC2006, apresentando melhores resultados comparando aos outros descritores em questão. Tal descritor foi proposto com a observação de que o descritor LBP pode gerar ruídos utilizando apenas a comparação dos vizinhos com o pixel central. O descritor M-LMP insere em sua modelagem matemática o cálculo da média dos pixels da vizinhança, com o objetivo de evitar ruídos e deixar as características mais robustas. Os descritores foram desenvolvidos de tal forma que seja possível uma redução de dimensionalidade de maneira simples e sem a necessidade de aplicação de técnicas como o PCA. Os resultados desse trabalho mostraram que os descritores propostos foram robustos na descrição das imagens, quantificando a similaridade entre as imagens por meio da abordagem Bag-of-Features (BoF), e com isso, apresentando resultados computacionais relevantes para a área de pesquisa. / Local feature descriptors used in objects representation have become very popular in recent years. Such descriptors have the ability to characterize the image content in compact and discriminative data. The information extracted from descriptors is represented by feature vectors and is used in various applications such as face recognition, complex scenes and textures. In this work we explored the analysis and modeling of local descriptors to characterize invariant scale images, rotation, changes in illumination and viewpoint. This thesis presents three new local descriptors that contribute to the current research advancement in computer vision area, developing new models for the characterization of images and image recognition. The first contribution is the development of a descriptor based on the mapping of gray-level-differences, called Center-Symmetric Local Mapped Pattern (CS-LMP). The proposed descriptor showed to be invariant to scale change, rotation, illumination and partial changes of viewpoint and compared to the descriptors Center-Symmetric Local Binary Pattern (CS-LBP) and Scale-Invariant Feature Trans- form (SIFT). The second contribution is a modification of the CS-LMP descriptor, which we call Modified Center-Symmetric Local Mapped Pattern (MCS-LMP). The descriptor includes the central pixel in mathematical modeling to better characterize the image content. The proposed descriptor presented superior results to CS-LMP , SIFT and LIOP descriptors in evaluating recognition of complex scenes. The third proposal includes the development of an image descriptor called Mean-Local Mapped Pattern (M-LMP) capturing more accurately small transitions of pixels in the image, resulting in a greater number of \"matches\" correct than CS-LBP and SIFT descriptors. In addition, experiments for classifying objects have been achieved by using the images based Caltech and Pascal VOC2006, presenting better results compared to other descriptors in question. This descriptor was proposed with the observation that the LBP descriptor can gene- rate noise using only the comparison of the neighbors to the central pixel. The M-LMP descriptor inserts in their mathematical modeling the averaging of the pixels of the neighborhood, in order to avoid noise and leave the more robust features. The results of this thesis showed that the proposed descriptors were robust in the description of the images, quantifying the similarity between images using the Bag-of-Features approach (BoF), and thus, presenting relevant computational results for the research area. Bag-of-Features Support Vector Machine Bag-of-Features Descritores locais Local descriptors Support Vector Machine
2	Novos descritores de textura para localização e identificação de objetos em imagens usando Bag-of-Features / New texture descriptors for locating and identifying objects in images using Bag-of-Features Carolina Toledo Ferraz 02 September 2016 (has links) Descritores de características locais de imagens utilizados na representação de objetos têm se tornado muito populares nos últimos anos. Tais descritores têm a capacidade de caracterizar o conteúdo da imagem em dados compactos e discriminativos. As informações extraídas dos descritores são representadas por meio de vetores de características e são utilizados em várias aplicações, tais como reconhecimento de faces, cenas complexas e texturas. Neste trabalho foi explorada a análise e modelagem de descritores locais para caracterização de imagens invariantes a escala, rotação, iluminação e mudanças de ponto de vista. Esta tese apresenta três novos descritores locais que contribuem com o avanço das pesquisas atuais na área de visão computacional, desenvolvendo novos modelos para a caracterização de imagens e reconhecimento de imagens. A primeira contribuição desta tese é referente ao desenvolvimento de um descritor de imagens baseado no mapeamento das diferenças de nível de cinza, chamado Center-Symmetric Local Mapped Pattern (CS-LMP). O descritor proposto mostrou-se robusto a mudanças de escala, rotação, iluminação e mudanças parciais de ponto de vista, e foi comparado aos descritores Center-Symmetric Local Binary Pattern (CS-LBP) e Scale-Invariant Feature Transform (SIFT). A segunda contribuição é uma modificação do descritor CS-LMP, e foi denominada Modified Center-Symmetric Local Mapped Pattern (MCS-LMP). O descritor inclui o cálculo do pixel central na modelagem matemática, caracterizando melhor o conteúdo da mesma. O descritor proposto apresentou resultados superiores aos descritores CS-LMP, SIFT e LIOP na avaliação de reconhecimento de cenas complexas. A terceira contribuição é o desenvolvimento de um descritor de imagens chamado Mean-Local Mapped Pattern (M-LMP) que captura de modo mais fiel pequenas transições dos pixels na imagem, resultando em um número maior de \"matches\" corretos do que os descritores CS-LBP e SIFT. Além disso, foram realizados experimentos para classificação de objetos usando as base de imagens Caltech e Pascal VOC2006, apresentando melhores resultados comparando aos outros descritores em questão. Tal descritor foi proposto com a observação de que o descritor LBP pode gerar ruídos utilizando apenas a comparação dos vizinhos com o pixel central. O descritor M-LMP insere em sua modelagem matemática o cálculo da média dos pixels da vizinhança, com o objetivo de evitar ruídos e deixar as características mais robustas. Os descritores foram desenvolvidos de tal forma que seja possível uma redução de dimensionalidade de maneira simples e sem a necessidade de aplicação de técnicas como o PCA. Os resultados desse trabalho mostraram que os descritores propostos foram robustos na descrição das imagens, quantificando a similaridade entre as imagens por meio da abordagem Bag-of-Features (BoF), e com isso, apresentando resultados computacionais relevantes para a área de pesquisa. / Local feature descriptors used in objects representation have become very popular in recent years. Such descriptors have the ability to characterize the image content in compact and discriminative data. The information extracted from descriptors is represented by feature vectors and is used in various applications such as face recognition, complex scenes and textures. In this work we explored the analysis and modeling of local descriptors to characterize invariant scale images, rotation, changes in illumination and viewpoint. This thesis presents three new local descriptors that contribute to the current research advancement in computer vision area, developing new models for the characterization of images and image recognition. The first contribution is the development of a descriptor based on the mapping of gray-level-differences, called Center-Symmetric Local Mapped Pattern (CS-LMP). The proposed descriptor showed to be invariant to scale change, rotation, illumination and partial changes of viewpoint and compared to the descriptors Center-Symmetric Local Binary Pattern (CS-LBP) and Scale-Invariant Feature Trans- form (SIFT). The second contribution is a modification of the CS-LMP descriptor, which we call Modified Center-Symmetric Local Mapped Pattern (MCS-LMP). The descriptor includes the central pixel in mathematical modeling to better characterize the image content. The proposed descriptor presented superior results to CS-LMP , SIFT and LIOP descriptors in evaluating recognition of complex scenes. The third proposal includes the development of an image descriptor called Mean-Local Mapped Pattern (M-LMP) capturing more accurately small transitions of pixels in the image, resulting in a greater number of \"matches\" correct than CS-LBP and SIFT descriptors. In addition, experiments for classifying objects have been achieved by using the images based Caltech and Pascal VOC2006, presenting better results compared to other descriptors in question. This descriptor was proposed with the observation that the LBP descriptor can gene- rate noise using only the comparison of the neighbors to the central pixel. The M-LMP descriptor inserts in their mathematical modeling the averaging of the pixels of the neighborhood, in order to avoid noise and leave the more robust features. The results of this thesis showed that the proposed descriptors were robust in the description of the images, quantifying the similarity between images using the Bag-of-Features approach (BoF), and thus, presenting relevant computational results for the research area. Bag-of-Features Support Vector Machine Descritores locais Bag-of-Features Local descriptors Support Vector Machine
3	Local Part Model for Action Recognition in Realistic Videos Shi, Feng 27 May 2014 (has links) This thesis presents a framework for automatic recognition of human actions in uncontrolled, realistic video data such as movies, internet and surveillance videos. In this thesis, the human action recognition problem is solved from the perspective of local spatio-temporal feature and bag-of-features representation. The bag-of-features model only contains statistics of unordered low-level primitives, and any information concerning temporal ordering and spatial structure is lost. To address this issue, we proposed a novel multiscale local part model on the purpose of maintaining both structure information and ordering of local events for action recognition. The method includes both a coarse primitive level root feature covering event-content statistics and higher resolution overlapping part features incorporating local structure and temporal relationships. To extract the local spatio-temporal features, we investigated a random sampling strategy for efficient action recognition. We also introduced the idea of using very high sampling density for efficient and accurate classification. We further explored the potential of the method with the joint optimization of two constraints: the classification accuracy and its efficiency. On the performance side, we proposed a new local descriptor, called GBH, based on spatial and temporal gradients. It significantly improved the performance of the pure spatial gradient-based HOG descriptor on action recognition while preserving high computational efficiency. We have also shown that the performance of the state-of-the-art MBH descriptor can be improved with a discontinuity-preserving optical flow algorithm. In addition, a new method based on histogram intersection kernel was introduced to combine multiple channels of different descriptors. This method has the advantages of improving recognition accuracy with multiple descriptors and speeding up the classification process. On the efficiency side, we applied PCA to reduce the feature dimension which resulted in fast bag-of-features matching. We also evaluated the FLANN method on real-time action recognition. We conducted extensive experiments on real-world videos from challenging public action datasets. We showed that our methods achieved the state-of-the-art with real-time computational potential, thus highlighting the effectiveness and efficiency of the proposed methods. bag-of-features (BoF) action recognition random sampling local part model multi-channel SVM
4	Modeling Time Series Data for Supervised Learning January 2012 (has links) abstract: Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection. / Dissertation/Thesis / Ph.D. Industrial Engineering 2012 Industrial engineering Information science Statistics bag-of-features codebook feature extraction pattern discovery supervised learning time series
5	Local Part Model for Action Recognition in Realistic Videos Shi, Feng January 2014 (has links) This thesis presents a framework for automatic recognition of human actions in uncontrolled, realistic video data such as movies, internet and surveillance videos. In this thesis, the human action recognition problem is solved from the perspective of local spatio-temporal feature and bag-of-features representation. The bag-of-features model only contains statistics of unordered low-level primitives, and any information concerning temporal ordering and spatial structure is lost. To address this issue, we proposed a novel multiscale local part model on the purpose of maintaining both structure information and ordering of local events for action recognition. The method includes both a coarse primitive level root feature covering event-content statistics and higher resolution overlapping part features incorporating local structure and temporal relationships. To extract the local spatio-temporal features, we investigated a random sampling strategy for efficient action recognition. We also introduced the idea of using very high sampling density for efficient and accurate classification. We further explored the potential of the method with the joint optimization of two constraints: the classification accuracy and its efficiency. On the performance side, we proposed a new local descriptor, called GBH, based on spatial and temporal gradients. It significantly improved the performance of the pure spatial gradient-based HOG descriptor on action recognition while preserving high computational efficiency. We have also shown that the performance of the state-of-the-art MBH descriptor can be improved with a discontinuity-preserving optical flow algorithm. In addition, a new method based on histogram intersection kernel was introduced to combine multiple channels of different descriptors. This method has the advantages of improving recognition accuracy with multiple descriptors and speeding up the classification process. On the efficiency side, we applied PCA to reduce the feature dimension which resulted in fast bag-of-features matching. We also evaluated the FLANN method on real-time action recognition. We conducted extensive experiments on real-world videos from challenging public action datasets. We showed that our methods achieved the state-of-the-art with real-time computational potential, thus highlighting the effectiveness and efficiency of the proposed methods. bag-of-features (BoF) action recognition random sampling local part model multi-channel SVM
6	Genre-based Video Clustering using Deep Learning : By Extraction feature using Object Detection and Action Recognition Vellala, Abhinay January 2021 (has links) Social media has become an integral part of the Internet. There have been users across the world sharing content like images, texts, videos, and so on. There is a huge amount of data being generated and it has become a challenge to the social media platforms to group the content for further usage like recommending a video. Especially, grouping videos based on similarity requires extracting features. This thesis investigates potential approaches to extract features that can help in determining the similarity between videos. Features of given videos are extracted using Object Detection and Action Recognition. Bag-of-features representation is used to build the vocabulary of all the features and transform data that can be useful in clustering videos. Probabilistic model-based clustering, Multinomial Mixture model is used to determine the underlying clusters within the data by maximizing the expected log-likelihood and estimating the parameters of data as well as probabilities of clusters. Analysis of clusters is done to understand the genre based on dominant actions and objects. Bayesian Information Criterion(BIC) and Akaike Information Criterion(AIC) are used to determine the optimal number of clusters within the given videos. AIC/BIC scores achieved minimum scores at 32 clusters which are chosen to be the optimal number of clusters. The data is labeled with the genres and Logistic regression is performed to check the cluster performance on test data and has achieved 96% accuracy Bag-of-features Object detection Action recognition Multinomial mixture model Probability Theory and Statistics Sannolikhetsteori och statistik
7	Geometry-Aware Learning Algorithms for Histogram Data Using Adaptive Metric Embeddings and Kernel Functions / 距離の適応埋込みとカーネル関数を用いたヒストグラムデータからの幾何認識学習アルゴリズム Le, Thanh Tam 25 January 2016 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19417号 / 情博第596号 / 新制\|\|情\|\|104(附属図書館) / 32442 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授山本章博, 教授黒橋禎夫, 教授鹿島久嗣, 准教授 Cuturi Marco / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Metric learning Kernel method Histogram Aitchison geometry the probability simplex Bag of features Riemannian manifold 007
8	Real-time Hand Gesture Detection and Recognition for Human Computer Interaction Dardas, Nasser Hasan Abdel-Qader 08 November 2012 (has links) This thesis focuses on bare hand gesture recognition by proposing a new architecture to solve the problem of real-time vision-based hand detection, tracking, and gesture recognition for interaction with an application via hand gestures. The first stage of our system allows detecting and tracking a bare hand in a cluttered background using face subtraction, skin detection and contour comparison. The second stage allows recognizing hand gestures using bag-of-features and multi-class Support Vector Machine (SVM) algorithms. Finally, a grammar has been developed to generate gesture commands for application control. Our hand gesture recognition system consists of two steps: offline training and online testing. In the training stage, after extracting the keypoints for every training image using the Scale Invariance Feature Transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multi-class SVM to build the classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using my algorithm. Then, the keypoints are extracted for every small image that contains the detected hand posture and fed into the cluster model to map them into a bag-of-words vector, which is fed into the multi-class SVM classifier to recognize the hand gesture. Another hand gesture recognition system was proposed using Principle Components Analysis (PCA). The most eigenvectors and weights of training images are determined. In the testing stage, the hand posture is detected for every frame using my algorithm. Then, the small image that contains the detected hand is projected onto the most eigenvectors of training images to form its test weights. Finally, the minimum Euclidean distance is determined among the test weights and the training weights of each training image to recognize the hand gesture. Two application of gesture-based interaction with a 3D gaming virtual environment were implemented. The exertion videogame makes use of a stationary bicycle as one of the main inputs for game playing. The user can control and direct left-right movement and shooting actions in the game by a set of hand gesture commands, while in the second game, the user can control and direct a helicopter over the city by a set of hand gesture commands. Posture recognition Gesture recognition Scale Invariant Feature Transform (SIFT) K-means Bag-of-features Support Vector Machine (SVM) Human-computer interaction
9	Real-time Hand Gesture Detection and Recognition for Human Computer Interaction Dardas, Nasser Hasan Abdel-Qader 08 November 2012 (has links) This thesis focuses on bare hand gesture recognition by proposing a new architecture to solve the problem of real-time vision-based hand detection, tracking, and gesture recognition for interaction with an application via hand gestures. The first stage of our system allows detecting and tracking a bare hand in a cluttered background using face subtraction, skin detection and contour comparison. The second stage allows recognizing hand gestures using bag-of-features and multi-class Support Vector Machine (SVM) algorithms. Finally, a grammar has been developed to generate gesture commands for application control. Our hand gesture recognition system consists of two steps: offline training and online testing. In the training stage, after extracting the keypoints for every training image using the Scale Invariance Feature Transform (SIFT), a vector quantization technique will map keypoints from every training image into a unified dimensional histogram vector (bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multi-class SVM to build the classifier. In the testing stage, for every frame captured from a webcam, the hand is detected using my algorithm. Then, the keypoints are extracted for every small image that contains the detected hand posture and fed into the cluster model to map them into a bag-of-words vector, which is fed into the multi-class SVM classifier to recognize the hand gesture. Another hand gesture recognition system was proposed using Principle Components Analysis (PCA). The most eigenvectors and weights of training images are determined. In the testing stage, the hand posture is detected for every frame using my algorithm. Then, the small image that contains the detected hand is projected onto the most eigenvectors of training images to form its test weights. Finally, the minimum Euclidean distance is determined among the test weights and the training weights of each training image to recognize the hand gesture. Two application of gesture-based interaction with a 3D gaming virtual environment were implemented. The exertion videogame makes use of a stationary bicycle as one of the main inputs for game playing. The user can control and direct left-right movement and shooting actions in the game by a set of hand gesture commands, while in the second game, the user can control and direct a helicopter over the city by a set of hand gesture commands. Posture recognition Gesture recognition Scale Invariant Feature Transform (SIFT) K-means Bag-of-features Support Vector Machine (SVM) Human-computer interaction
10	Segmentation d'objets déformables en imagerie ultrasonore / Deformable object segmentation in ultra-sound images Massich, Joan 04 December 2013 (has links) Le cancer du sein est le type de cancer le plus répandu, il est la cause principale de mortalité chez les femmes aussi bien dans les pays occidentaux que dans les pays en voie de développement. L'imagerie médicale joue un rôle clef dans la réduction de la mortalité du cancer du sein, en facilitant sa première détection par le dépistage, le diagnostic et la biopsie guidée. Bien que la Mammographie Numérique (DM) reste la référence pour les méthodes d'examen existantes, les échographies ont prouvé leur place en tant que modalité complémentaire. Les images de cette dernière fournissent des informations permettant de différencier le caratère bénin ou malin des lésions solides, ce qui ne peut être détecté par DM. Malgré leur utilité clinique, les images échographiques sont bruitées, ce qui compromet les diagnostiques des radiologues à partir de celles ci. C'est pourquoi un des objectifs premiers des chercheurs en imagerie médicale est d'améliorer la qualité des images et des méthodologies afin de simplifier et de systématiser la lecture et l'interprétation de ces images.La méthode proposée considère le processus de segmentation comme la minimisation d'une structure probabilistique multi-label utilisant un algorithme de minimisation du Max-Flow/Min-Cut pour associer le label adéquat parmi un ensemble de labels figurant des types de tissus, et ce, pour tout les pixels de l'image.Cette dernière est divisée en régions adjacentes afin que tous les pixels d'une même régions soient labelisés de la même manière en fin du processus. Des modèles stochastiques pour la labellisation sont crées à partir d'une base d'apprentissage de données. / Breast cancer is the second most common type of cancer being the leading cause of cancer death among females both in western and in economically developing countries. Medical imaging is key for early detection, diagnosis and treatment follow-up. Despite Digital Mammography (DM) remains the reference imaging modality, Ultra-Sound (US) imaging has proven to be a successful adjunct image modality for breast cancer screening, specially as a consequence of the discriminative capabilities that US offers for differentiating between solid lesions that are benign or malignant. Despite US usability,US suffers inconveniences due to its natural noise that compromises the diagnosis capabilities of radiologists. Therefore the research interest in providing radiologists with Computer Aided Diagnosis (CAD) tools to assist the doctors during decision taking. This thesis analyzes the current strategies to segment breast lesions in US data in order to infer meaningful information to be feet to CAD, and proposes a fully automatic methodology for generating accurate segmentations of breast lesions in US data with low false positive rates. Ultrasonore Ultrasound Breast cancer Segmentation Graph-cuts Optimization Bag-of-words Bag-of-features Features Machine learning Medical imaging 620 610.2

Search results