Spelling suggestions: "subject:"cisual words"" "subject:"4visual words""
1 |
Content based video retrieval via spatial-temporal information discoveryWang, Lei January 2013 (has links)
Content based video retrieval (CBVR) has been strongly motivated by a variety of realworld applications. Most state-of-the-art CBVR systems are built based on Bag-of-visual- Words (BovW) framework for visual resources representation and access. The framework, however, ignores spatial and temporal information contained in videos, which plays a fundamental role in unveiling semantic meanings. The information includes not only the spatial layout of visual content on a still frame (image), but also temporal changes across the sequential frames. Specially, spatially and temporally co-occurring visual words, which are extracted under the BovW framework, often tend to collaboratively represent objects, scenes, or events in the videos. The spatial and temporal information discovery would be useful to advance the CBVR technology. In this thesis, we propose to explore and analyse the spatial and temporal information from a new perspective: i) co-occurrence of the visual words is formulated as a correlation matrix, ii) spatial proximity and temporal coherence are analytically and empirically studied to re ne this correlation. Following this, a quantitative spatial and temporal correlation (STC) model is de ned. The STC discovered from either the query example (denoted by QC) or the data collection (denoted by DC) are assumed to determine speci- city of the visual words in the retrieval model, i:e: selected Words-Of-Interest are found more important for certain topics. Based on this hypothesis, we utilized the STC matrix to establish a novel visual content similarity measurement method and a query reformulation scheme for the retrieval model. Additionally, the STC also characterizes the context of the visual words, and accordingly a STC-Based context similarity measurement is proposed to detect the synonymous visual words. The method partially solves an inherent error of visual vocabulary under the BovW framework. Systematic experimental evaluations on public TRECVID and CC WEB VIDEO video collections demonstrate that the proposed methods based on the STC can substantially improve retrieval e ectiveness of the BovW framework. The retrieval model based on STC outperforms state-of-the-art CBVR methods on the data collections without storage and computational expense. Furthermore, the rebuilt visual vocabulary in this thesis is more compact and e ective. Above methods can be incorporated together for e ective and e cient CBVR system implementation. Based on the experimental results, it is concluded that the spatial-temporal correlation e ectively approximates the semantical correlation. This discovered correlation approximation can be utilized for both visual content representation and similarity measurement, which are key issues for CBVR technology development.
|
2 |
Image Retrieval using Landmark Indexing for Indoor NavigationSinha, Dwaipayan 25 April 2014 (has links)
A novel approach is proposed for real-time retrieval of images from a large database of overlapping images of an indoor environment. The procedure extracts visual features from images using selected computer vision techniques, and processes the extracted features to create a reduced list of features annotated with the frame numbers they appear in. This method is named landmark indexing. Unlike some state-of-the-art approaches, the proposed method does not need to consider large image adjacency graphs because the overlap of the images in the map sufficiently increases information gain, and mapping of similar features to the same landmark reduces the search space to improve search efficiency. Empirical evidence from experiments on real datasets shows high (90-100%) accuracy in image retrieval, and improvement in search time from the order of 100-200 milliseconds to the order of 10-30 milliseconds. The image retrieval technique is also demonstrated by integrating it into a 3D real-time navigation system. This system is tested in several indoor environments and all experiments show accurate localization results in large indoor areas with errors in the order of 15-20 centimeters only. / Thesis (Master, Electrical & Computer Engineering) -- Queen's University, 2014-04-24 12:44:41.429
|
3 |
Detecção de cenas em segmentos semanticamente complexos / Detection of scenes in semantically complex segmentsLopes, Bruno Lorenço 28 April 2014 (has links)
Diversas áreas da Computação (Personalização e Adaptação de Conteúdo, Recuperação de Informação, entre outras) se beneficiam da segmentação de vídeo em unidades menores de informação. A literatura apresenta diversos métodos e técnicas cujo objetivo é identificar essas unidades. Uma limitação é que tais técnicas não tratam o problema da detecção de cenas em segmentos semanticamente complexos, definidos como trechos de vídeo que apresentam mais de um assunto ou tema, e cuja semântica latente dificilmente pode ser determinada utilizando-se somente uma única mídia. Esses segmentos são muito relevantes, pois estão presentes em diversos domínios de vídeo, tais como filmes, noticiários e mesmo comerciais. A presente Dissertação de Mestrado propõe uma técnica de segmentação de vídeo capaz de identificar cenas em segmentos semanticamente complexos. Para isso utiliza a semântica latente alcançada com o uso de Bag of Visual Words para agrupar os segmentos de um vídeo. O agrupamento é baseado em multimodalidade, analisando-se características visuais e sonoras de cada vídeo e combinando-se os resultados por meio da estratégia fusão tardia. O presente trabalho demonstra a viabilidade técnica em reconhecer cenas em segmentos semanticamente complexos / Many Computational Science areas (Content Personalization and Adaptation, Information Retrieval, among other) benefit from video segmentation in smaller information units. The literature reports lots of techniques and methods, whose goal is to identify these units. One of these techniques limitations is that they dont handle scene detection in semantically complex segments, which are defined as video snippets that present more than one subject or theme, whose latent semantics can hardly be determined using only one media. Those segments are very relevant, since they are present in multiple video domains as movies, news and even television commercials. This Masters dissertation proposes a video scene segmentation technique able to detect scenes in semantically complex segments. In order to achieve this goal it uses latent semantics extracted by the Bag of VisualWords to group a video segments. This grouping process is based on multimodality, through the visual and aural features analysis, and their results combination using late fusion strategy. This works demonstrates technical feasibility in recognizing scenes in semantically complex segments
|
4 |
Detecção de cenas em segmentos semanticamente complexos / Detection of scenes in semantically complex segmentsBruno Lorenço Lopes 28 April 2014 (has links)
Diversas áreas da Computação (Personalização e Adaptação de Conteúdo, Recuperação de Informação, entre outras) se beneficiam da segmentação de vídeo em unidades menores de informação. A literatura apresenta diversos métodos e técnicas cujo objetivo é identificar essas unidades. Uma limitação é que tais técnicas não tratam o problema da detecção de cenas em segmentos semanticamente complexos, definidos como trechos de vídeo que apresentam mais de um assunto ou tema, e cuja semântica latente dificilmente pode ser determinada utilizando-se somente uma única mídia. Esses segmentos são muito relevantes, pois estão presentes em diversos domínios de vídeo, tais como filmes, noticiários e mesmo comerciais. A presente Dissertação de Mestrado propõe uma técnica de segmentação de vídeo capaz de identificar cenas em segmentos semanticamente complexos. Para isso utiliza a semântica latente alcançada com o uso de Bag of Visual Words para agrupar os segmentos de um vídeo. O agrupamento é baseado em multimodalidade, analisando-se características visuais e sonoras de cada vídeo e combinando-se os resultados por meio da estratégia fusão tardia. O presente trabalho demonstra a viabilidade técnica em reconhecer cenas em segmentos semanticamente complexos / Many Computational Science areas (Content Personalization and Adaptation, Information Retrieval, among other) benefit from video segmentation in smaller information units. The literature reports lots of techniques and methods, whose goal is to identify these units. One of these techniques limitations is that they dont handle scene detection in semantically complex segments, which are defined as video snippets that present more than one subject or theme, whose latent semantics can hardly be determined using only one media. Those segments are very relevant, since they are present in multiple video domains as movies, news and even television commercials. This Masters dissertation proposes a video scene segmentation technique able to detect scenes in semantically complex segments. In order to achieve this goal it uses latent semantics extracted by the Bag of VisualWords to group a video segments. This grouping process is based on multimodality, through the visual and aural features analysis, and their results combination using late fusion strategy. This works demonstrates technical feasibility in recognizing scenes in semantically complex segments
|
5 |
Learning to Predict Clinical Outcomes from Soft Tissue Sarcoma MRIFarhidzadeh, Hamidreza 06 November 2017 (has links)
Soft Tissue Sarcomas (STS) are among the most dangerous diseases, with a 50% mortality rate in the USA in 2016. Heterogeneous responses to the treatments of the same sub-type of STS as well as intra-tumor heterogeneity make the study of biopsies imprecise. Radiologists make efforts to find non-invasive approaches to gather useful and important information regarding characteristics and behaviors of STS tumors, such as aggressiveness and recurrence. Quantitative image analysis is an approach to integrate information extracted using data science, such as data mining and machine learning with biological an clinical data to assist radiologists in making the best recommendation on clinical trials and the course of treatment.
The new methods in “Radiomics" extract meaningful features from medical imaging data for diagnostic and prognostic goals. Furthermore, features extracted from Convolutional Neural Networks (CNNs) are demonstrating very powerful and robust performance in computer aided decision systems (CADs). Also, a well-known computer vision approach, Bag of Visual Words, has recently been applied on imaging data for machine learning purposes such as classification of different types of tumors based on their specific behavior and phenotype. These approaches are not fully and widely investigated in STS.
This dissertation provides novel versions of image analysis based on Radiomics and Bag of Visual Words integrated with deep features to quantify the heterogeneity of entire STS as well as sub-regions, which have predictive and prognostic imaging features, from single and multi-sequence Magnetic Resonance Imaging (MRI). STS are types of cancer which are rarely touched in term of quantitative cancer analysis versus other type of cancers such as lung, brain and breast cancers. This dissertation does a comprehensive analysis on available data in 2D and multi-slice to predict the behavior of the STS with regard to clinical outcomes such as recurrence or metastasis and amount of tumor necrosis.
The experimental results using Radiomics as well as a new ensemble of Bags of Visual Words framework are promising with 91.66% classification accuracy and 0.91 AUC for metastasis, using ensemble of Bags of Visual Words framework integrated with deep features, and 82.44% classification accuracy with 0.63 AUC for necrosis progression, using Radiomics framework, in tests on the available datasets.
|
6 |
Investigating the relationship between the distribution of local semantic concepts and local keypoints for image annotationAlqasrawi, Yousef T. N., Neagu, Daniel January 2014 (has links)
No / The problem of image annotation has gained increasing attention from many researchers in computer vision. Few works have addressed the use of bag of visual words for scene annotation at region level. The aim of this paper is to study the relationship between the distribution of local semantic concepts and local keypoints located in image regions labelled with these semantic concepts. Based on this study, we investigate whether bag of visual words model can be used to efficiently represent the content of natural scene image regions, so images can be annotated with local semantic concepts. Also, this paper presents local from global approach which study the influence of using visual vocabularies generated from general scene categories to build bag of visual words at region level. Extensive experiments are conducted over a natural scene dataset with six categories. The reported results have shown the plausibility of using the BOW model to represent the semantic information of image regions.
|
7 |
Natural scene classification, annotation and retrieval : developing different approaches for semantic scene modelling based on Bag of Visual WordsAlqasrawi, Yousef T. N. January 2012 (has links)
With the availability of inexpensive hardware and software, digital imaging has become an important medium of communication in our daily lives. A huge amount of digital images are being collected and become available through the internet and stored in various fields such as personal image collections, medical imaging, digital arts etc. Therefore, it is important to make sure that images are stored, searched and accessed in an efficient manner. The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Based on this promising model, this thesis investigates three main problems: natural scene classification, annotation and retrieval. Given an image, the task is to design a system that can determine to which class that image belongs to (classification), what semantic concepts it contain (annotation) and what images are most similar to (retrieval). This thesis contributes to scene classification by proposing a weighting approach, named keypoints density-based weighting method (KDW), to control the fusion of colour information and bag of visual words on spatial pyramid layout in a unified framework. Different configurations of BOW, integrated visual vocabularies and multiple image descriptors are investigated and analyzed. The proposed approaches are extensively evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories using 10-fold cross validation. The second contribution in this thesis, the scene annotation task, is to explore whether the integrated visual vocabularies generated for scene classification can be used to model the local semantic information of natural scenes. In this direction, image annotation is considered as a classification problem where images are partitioned into 10x10 fixed grid and each block, represented by BOW and different image descriptors, is classified into one of predefined semantic classes. An image is then represented by counting the percentage of every semantic concept detected in the image. Experimental results on 6 scene categories demonstrate the effectiveness of the proposed approach. Finally, this thesis further explores, with an extensive experimental work, the use of different configurations of the BOW for natural scene retrieval.
|
8 |
Natural scene classification, annotation and retrieval. Developing different approaches for semantic scene modelling based on Bag of Visual Words.Alqasrawi, Yousef T. N. January 2012 (has links)
With the availability of inexpensive hardware and software, digital imaging has become an important medium of communication in our daily lives. A huge amount of digital images are being collected and become available through the internet and stored in various fields such as personal image collections, medical imaging, digital arts etc. Therefore, it is important to make sure that images are stored, searched and accessed in an efficient manner. The use of bag of visual words (BOW) model for modelling images based on local invariant features computed at interest point locations has become a standard choice for many computer vision tasks. Based on this promising model, this thesis investigates three main problems: natural scene classification, annotation and retrieval. Given an image, the task is to design a system that can determine to which class that image belongs to (classification), what semantic concepts it contain (annotation) and what images are most similar to (retrieval).
This thesis contributes to scene classification by proposing a weighting approach, named keypoints density-based weighting method (KDW), to control the fusion of colour information and bag of visual words on spatial pyramid layout in a unified framework. Different configurations of BOW, integrated visual vocabularies and multiple image descriptors are investigated and analyzed. The proposed approaches are extensively evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories using 10-fold cross validation. The second contribution in this thesis, the scene annotation task, is to explore whether the integrated visual vocabularies generated for scene classification can be used to model the local semantic information of natural scenes. In this direction, image annotation is considered as a classification problem where images are partitioned into 10x10 fixed grid and each block, represented by BOW and different image descriptors, is classified into one of predefined semantic classes. An image is then represented by counting the percentage of every semantic concept detected in the image. Experimental results on 6 scene categories demonstrate the effectiveness of the proposed approach. Finally, this thesis further explores, with an extensive experimental work, the use of different configurations of the BOW for natural scene retrieval. / Applied Science University in Jordan
|
9 |
Geo-spatial Object Detection Using Local DescriptorsAytekin, Caglar 01 August 2011 (has links) (PDF)
There is an increasing trend towards object detection from aerial and satellite images. Most of the widely used object detection algorithms are based on local features. In such an approach, first, the local features are detected and described in an image, then a representation of the images are formed using these local features for supervised learning and these representations are used during classification . In this thesis, Harris and SIFT algorithms are used as local feature detector and SIFT approach is used as a local feature descriptor. Using these tools, Bag of Visual Words algorithm is examined in order to represent an image by the help of histograms of visual words. Finally, SVM classifier is trained by using positive and negative samples from a training set. In addition to the classical bag of visual words approach, two novel extensions are also proposed. As the first case, the visual words are weighted proportional to their importance of belonging to positive samples. The important features are basically the features occurring more in the object and less in the background. Secondly, a principal component analysis after forming the histograms is processed in order to remove the undesired redundancy and noise in the data, reduce the dimension of the data to yield better classifying performance. Based on the test results, it could be argued that the proposed approach is capable to detecting a number of geo-spatial objects, such as airplane or ships, for a reasonable performance.
|
10 |
Approximate string matching distance for image classification / Distance d’édition entre chaines d’histogrammes pour la classification d’imagesNguyen, Hong-Thinh 29 August 2014 (has links)
L'augmentation exponentielle du nombre d'images nécessite des moyens efficaces pour les classer en fonction de leur contenu visuel. Le sac de mot visuel (Bag-Of-visual-Words, BOW), en raison de sa simplicité et de sa robustesse, devient l'approche la plus populaire. Malheureusement, cette approche ne prend pas en compte de l'information spatiale, ce qui joue un rôle important dans les catégories de modélisation d'image. Récemment, Lazebnik ont introduit la représentation pyramidale spatiale (Spatial Pyramid Representation, SPR) qui a incorporé avec succès l'information spatiale dans le modèle BOW. Néanmoins, ce système de correspondance rigide empêche la SPR de gérer les variations et les transformations d'image. L'objectif principal de cette thèse est d'étudier un modèle de chaîne de correspondance plus souple qui prend l'avantage d'histogrammes de BOW locaux et se rapproche de la correspondance de la chaîne. Notre première contribution est basée sur une représentation en chaîne et une nouvelle distance d'édition (String Matching Distance, SMD) bien adapté pour les chaînes de l'histogramme qui peut calculer efficacement par programmation dynamique. Un noyau d'édition correspondant comprenant à la fois d'une pondération et d'un système pyramidal est également dérivée. La seconde contribution est une version étendue de SMD qui remplace les opérations d'insertion et de suppression par les opérations de fusion entre les symboles successifs, ce qui apporte de la souplesse labours et correspond aux images. Toutes les distances proposées sont évaluées sur plusieurs jeux de données tâche de classification et sont comparés avec plusieurs approches concurrentes / The exponential increasing of the number of images requires efficient ways to classify them based on their visual content. The most successful and popular approach is the Bag of visual Word (BoW) representation due to its simplicity and robustness. Unfortunately, this approach fails to capture the spatial image layout, which plays an important roles in modeling image categories. Recently, Lazebnik et al (2006) introduced the Spatial Pyramid Representation (SPR) which successfully incorporated spatial information into the BoW model. The idea of their approach is to split the image into a pyramidal grid and to represent each grid cell as a BoW. Assuming that images belonging to the same class have similar spatial distributions, it is possible to use a pairwise matching as similarity measurement. However, this rigid matching scheme prevents SPR to cope with image variations and transformations. The main objective of this dissertation is to study a more flexible string matching model. Keeping the idea of local BoW histograms, we introduce a new class of edit distance to compare strings of local histograms. Our first contribution is a string based image representation model and a new edit distance (called SMD for String Matching Distance) well suited for strings composed of symbols which are local BoWs. The new distance benefits from an efficient Dynamic Programming algorithm. A corresponding edit kernel including both a weighting and a pyramidal scheme is also derived. The performance is evaluated on classification tasks and compared to the standard method and several related methods. The new method outperforms other methods thanks to its ability to detect and ignore identical successive regions inside images. Our second contribution is to propose an extended version of SMD replacing insertion and deletion operations by merging operations between successive symbols. In this approach, the number of sub regions ie. the grid divisions may vary according to the visual content. We describe two algorithms to compute this merge-based distance. The first one is a greedy version which is efficient but can produce a non optimal edit script. The other one is an optimal version but it requires a 4th degree polynomial complexity. All the proposed distances are evaluated on several datasets and are shown to outperform comparable existing methods.
|
Page generated in 0.0619 seconds