Spelling suggestions: "subject:"image annotation"" "subject:"lmage annotation""
1 |
Learning Language-vision CorrespondencesJamieson, Michael 15 February 2011 (has links)
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. We also exploit co-occurrences among appearance models to learn hierarchical appearance models. Results of applying our method to three data sets in a variety of conditions demonstrate that from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval.
|
2 |
Learning Language-vision CorrespondencesJamieson, Michael 15 February 2011 (has links)
Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances of the objects. Only a small fraction of local features within any given image are associated with a particular caption word, and captions may contain irrelevant words not associated with any image object. We propose a novel algorithm that uses the repetition of feature neighborhoods across training images and a measure of correspondence with caption words to learn meaningful feature configurations (representing named objects). We also introduce a graph-based appearance model that captures some of the structure of an object by encoding the spatial relationships among the local visual features. In an iterative procedure we use language (the words) to drive a perceptual grouping process that assembles an appearance model for a named object. We also exploit co-occurrences among appearance models to learn hierarchical appearance models. Results of applying our method to three data sets in a variety of conditions demonstrate that from complex, cluttered, real-world scenes with noisy captions, we can learn both the names and appearances of objects, resulting in a set of models invariant to translation, scale, orientation, occlusion, and minor changes in viewpoint or articulation. These named models, in turn, are used to automatically annotate new, uncaptioned images, thereby facilitating keyword-based image retrieval.
|
3 |
Automatic caption generation for news imagesFeng, Yansong January 2011 (has links)
This thesis is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Automatic description generation for video frames would help security authorities manage more efficiently and utilize large volumes of monitoring data. Image search engines could potentially benefit from image description in supporting more accurate and targeted queries for end users. Importantly, generating image descriptions would aid blind or partially sighted people who cannot access visual information in the same way as sighted people can. However, previous work has relied on fine-gained resources, manually created for specific domains and applications In this thesis, we explore the feasibility of automatic caption generation for news images in a knowledge-lean way. We depart from previous work, as we learn a model of caption generation from publicly available data that has not been explicitly labelled for our task. The model consists of two components, namely extracting image content and rendering it in natural language. Specifically, we exploit data resources where images and their textual descriptions co-occur naturally. We present a new dataset consisting of news articles, images, and their captions that we required from the BBC News website. Rather than laboriously annotating images with keywords, we simply treat the captions as the labels. We show that it is possible to learn the visual and textual correspondence under such noisy conditions by extending an existing generative annotation model (Lavrenko et al., 2003). We also find that the accompanying news documents substantially complements the extraction of the image content. In order to provide a better modelling and representation of image content,We propose a probabilistic image annotation model that exploits the synergy between visual and textual modalities under the assumption that images and their textual descriptions are generated by a shared set of latent variables (topics). Using Latent Dirichlet Allocation (Blei and Jordan, 2003), we represent visual and textual modalities jointly as a probability distribution over a set of topics. Our model takes these topic distributions into account while finding the most likely keywords for an image and its associated document. The availability of news documents in our dataset allows us to perform the caption generation task in a fashion akin to text summarization; save one important difference that our model is not solely based on text but uses the image in order to select content from the document that should be present in the caption. We propose both extractive and abstractive caption generation models to render the extracted image content in natural language without relying on rich knowledge resources, sentence-templates or grammars. The backbone for both approaches is our topic-based image annotation model. Our extractive models examine how to best select sentences that overlap in content with our image annotation model. We modify an existing abstractive headline generation model to our scenario by incorporating visual information. Our own model operates over image description keywords and document phrases by taking dependency and word order constraints into account. Experimental results show that both approaches can generate human-readable captions for news images. Our phrase-based abstractive model manages to yield as informative captions as those written by the BBC journalists.
|
4 |
Information Mining of Image AnnotationLai, Shih-jin 02 July 2006 (has links)
Traditional Content-based image retrieval supports image searches based on color, texture and shape. However it is difficult and nonintuitive for most user to use those low level features to query images. And for most user they like search by keywords . For example , recently Google provide services in image search. Although it is named image search , but actually it is search by keywords ,not image-contents. For this reason MPEG-7 now support textual annotation standard which is MPEG-7 Multimedia Description Schemes (DSs) are metadata structures for describing and annotating audio-visual (AV) content. But manual annotation of image or video take time and expensive. we propose a system which could help us to make suitable auto-annotations.We extract the image factal features and use Diverse Density Algorithm for training models. In this way , user and system can interact in real-time . When trained models in database is growing, the system auto-annotation success rate is increasing.
|
5 |
Parallelizing support vector machines for scalable image annotationAlham, Nasullah Khalid January 2011 (has links)
Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications.
|
6 |
Effective Gene Expression Annotation Approaches for Mouse Brain ImagesJanuary 2016 (has links)
abstract: Understanding the complexity of temporal and spatial characteristics of gene expression over brain development is one of the crucial research topics in neuroscience. An accurate description of the locations and expression status of relative genes requires extensive experiment resources. The Allen Developing Mouse Brain Atlas provides a large number of in situ hybridization (ISH) images of gene expression over seven different mouse brain developmental stages. Studying mouse brain models helps us understand the gene expressions in human brains. This atlas collects about thousands of genes and now they are manually annotated by biologists. Due to the high labor cost of manual annotation, investigating an efficient approach to perform automated gene expression annotation on mouse brain images becomes necessary. In this thesis, a novel efficient approach based on machine learning framework is proposed. Features are extracted from raw brain images, and both binary classification and multi-class classification models are built with some supervised learning methods. To generate features, one of the most adopted methods in current research effort is to apply the bag-of-words (BoW) algorithm. However, both the efficiency and the accuracy of BoW are not outstanding when dealing with large-scale data. Thus, an augmented sparse coding method, which is called Stochastic Coordinate Coding, is adopted to generate high-level features in this thesis. In addition, a new multi-label classification model is proposed in this thesis. Label hierarchy is built based on the given brain ontology structure. Experiments have been conducted on the atlas and the results show that this approach is efficient and classifies the images with a relatively higher accuracy. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
7 |
A Novel Refinement Method For Automatic Image Annotation SystemsDemircioglu, Ersan 01 June 2011 (has links) (PDF)
Image annotation could be defined as the process of assigning a set of content related words to the image. An automatic image annotation system constructs the relationship between words and low level visual descriptors, which are extracted from images and by using these relationships annotates a newly seen image. The high demand on image annotation requirement increases the need to automatic image annotation systems. However, performances of current annotation methods are far from practical usage. The most common problem of current methods is the gap between semantic words and low level visual descriptors. Because of the semantic gap, annotation results of these methods contain irrelevant noisy words. To give more relevant results, refinement methods should be applied to classical image annotation outputs.
In this work, we represent a novel refinement approach for image annotation problem. The proposed system attacks the semantic gap problem by using the relationship between the words which are obtained from the dataset. Establishment of this relationship is the most crucial problem of the refinement process. In this study, we suggest a probabilistic and fuzzy approach for modelling the relationship among the words in the vocabulary, which is then employed to generate candidate annotations, based on the output of the image annotator. Candidate annotations are represented by a set of relational graphs. Finally, one of the generated candidate annotations is selected as a refined annotation result by using a clique optimization technique applied to the candidate annotation graph.
|
8 |
Automatic Image Annotation By Ensemble Of Visual DescriptorsAkbas, Emre 01 August 2006 (has links) (PDF)
Automatic image annotation is the process of automatically producing words to de-
scribe the content for a given image. It provides us with a natural means of semantic
indexing for content based image retrieval. In this thesis, two novel automatic image
annotation systems targeting di& / #64256 / erent types of annotated data are proposed. The
& / #64257 / rst system, called Supervised Ensemble of Visual Descriptors (SEVD), is trained
on a set of annotated images with prede& / #64257 / ned class labels. Then, the system auto-
matically annotates an unknown sample depending on the classi& / #64257 / cation results. The
second system, called Unsupervised Ensemble of Visual Descriptors (UEVD), assumes
no class labels. Therefore, the annotation of an unknown sample is accomplished by
unsupervised learning based on the visual similarity of images. The available auto-
matic annotation systems in the literature mostly use a single set of features to train
a single learning architecture. On the other hand, the proposed annotation systems
utilize a novel model of image representation in which an image is represented with
a variety of feature sets, spanning an almost complete visual information comprising
color, shape, and texture characteristics. In both systems, a separate learning entity is
trained for each feature set and these entities are gathered under an ensemble learning
approach. Empirical results show that both SEVD and UEVD outperform some of
the state-of-the-art automatic image annotation systems in equivalent experimental
setups.
|
9 |
Hanolistic: A Hierarchical Automatic Image Annotation System Using Holistic ApproachOztimur, Ozge 01 January 2008 (has links) (PDF)
Automatic image annotation is the process of assigning keywords to digital images depending
on the content information. In one sense, it is a mapping from the visual content information
to the semantic context information. In this thesis, we propose a novel approach for
automatic image annotation problem, where the annotation is formulated as a multivariate
mapping from a set of independent descriptor spaces, representing a whole image, to a set
of words, representing class labels. For this purpose, a hierarchical annotation architecture,
named as HANOLISTIC (Hierarchical Image Annotation System Using Holistic Approach),
is dened with two layers. At the rst layer, called level-0 annotator, each annotator is fed
by a set of distinct descriptor, extracted from the whole image. This enables us to represent
the image at each annotator by a dierent visual property of a descriptor. Since, we use
the whole image, the problematic segmentation process is avoided. Training of each annotator
is accomplished by a supervised learning paradigm, where each word is represented
by a class label. Note that, this approach is slightly dierent then the classical training
approaches, where each data has a unique label. In the proposed system, since each image
has one or more annotating words, we assume that an image belongs to more than one
class. The output of the level-0 annotators indicate the membership values of the words
in the vocabulary, to belong an image. These membership values from each annotator is,
then, aggregated at the second layer by using various rules, to obtain meta-layer annotator. The rules, employed in this study, involves summation and/or weighted summation of the
output of layer-0 annotators. Finally, a set of words from the vocabulary is selected based
on the ranking of the output of meta-layer. The hierarchical annotation system proposed in
this thesis outperforms state of the art annotation systems based on segmental and holistic
approaches. The proposed system is examined in-depth and compared to the other systems
in the literature by means of using several performance criteria.
|
10 |
Image Annotation With Semi-supervised ClusteringSayar, Ahmet 01 December 2009 (has links) (PDF)
Image annotation is defined as generating a set of textual words for a given image, learning from the available training data consisting of visual image content and annotation words.
Methods developed for image annotation usually make use of region clustering algorithms to quantize the visual information. Visual codebooks are generated from the region clusters of low level visual features. These codebooks are then, matched with the words of the text document related to the image, in various ways.
In this thesis, we propose a new image annotation technique, which improves the representation and quantization of the visual information by employing the available but unused information, called side information, which is hidden in the system. This side information is used to semi-supervise the clustering process which creates the visterms. The selection of side information depends on the visual image content, the annotation words and the relationship between them. Although there may be many different ways of defining and selecting side information, in this thesis, three types of side information are proposed. The first one is the hidden topic probability information obtained automatically from the text document associated with the image. The second one is the orientation and the third one is the color information around interest points that correspond to critical locations in the image. The side information provides a set of constraints in a semi-supervised K-means region clustering algorithm. Consequently, in generation of the visual terms from the regions, not only low level features are clustered, but also side information is used to complement the visual information,
called visterms. This complementary information is expected to close the semantic gap between the low level features extracted from each region and the high level textual information. Therefore, a better match between visual codebook and the annotation words is obtained. Moreover, a speedup is obtained in the modified K-means algorithm because of the constraints brought by the side information. The proposed algorithm is implemented in a high performance parallel computation environment.
|
Page generated in 0.1233 seconds