Global ETD Search

1	Quad-Tree based Image Encoding Methods for Data-Adaptive Visual Feature Learning / データ適応型特徴学習のための四分木に基づく画像の構造的表現法 Zhang, Cuicui 23 March 2015 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第19111号 / 情博第557号 / 新制\|\|情\|\|98(附属図書館) / 32062 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授松山隆司, 教授美濃導彦, 准教授梁雪峰 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Quad-Tree Image Encoding Data-Adaptive Visual Feature Learning 007
2	Exploração visual do espaço de características: uma abordagem para análise de imagens via projeção de dados multidimensionais / Visual feature space exploration: an approach to image analysis via multidimensional data projection Machado, Bruno Brandoli 13 December 2010 (has links) Sistemas para análise de imagens partem da premissa de que o conjunto de dados sob investigação está corretamente representado por características. Entretanto, definir quais características representam apropriadamente um conjunto de dados é uma tarefa desafiadora e exaustiva. Grande parte das técnicas de descrição existentes na literatura, especialmente quando os dados têm alta dimensionalidade, são baseadas puramente em medidas estatísticas ou abordagens baseadas em inteligência artificial, e normalmente são caixas-pretas para os usuários. A abordagem proposta nesta dissertação busca abrir esta caixa-preta por meio de representações visuais criadas pela técnica Multidimensional Classical Scaling, permitindo que usuários capturem interativamente a essência sobre a representatividade das características computadas de diferentes descritores. A abordagem é avaliada sobre seis conjuntos de imagens que contém texturas, imagens médicas e cenas naturais. Os experimentos mostram que, conforme a combinação de um conjunto de características melhora a qualidade da representação visual, a acurácia de classificação também melhora. A qualidade das representações é medida pelo índice da silhueta, superando problemas relacionados com a subjetividade de conclusões baseadas puramente em análise visual. Além disso, a capacidade de exploração visual do conjunto sob análise permite que usuários investiguem um dos maiores desafios em classificação de dados: a presença de variação intra-classe. Os resultados sugerem fortemente que esta abordagem pode ser empregada com sucesso como um guia para auxiliar especialistas a explorar, refinar e definir as características que representam apropriadamente um conjunto de imagens / Image analysis systems rely on the fact that the dataset under investigation is correctly represented by features. However, defining a set of features that properly represents a dataset is still a challenging and, in most cases, an exhausting task. Most of the available techniques, especially when a large number of features is considered, are based on purely quantitative statistical measures or approaches based on artificial intelligence, and normally are black-boxes to the user. The approach proposed here seeks to open this black-box by means of visual representations via Multidimensional Classical Scaling projection technique, enabling users to get insight about the meaning and representativeness of the features computed from different feature extraction algorithms and sets of parameters. This approach is evaluated over six image datasets that contains textures, medical images and outdoor scenes. The results show that, as the combination of sets of features and changes in parameters improves the quality of the visual representation, the accuracy of the classification for the computed features also improves. In order to reduce this subjectiveness, a measure called silhouette index, which was originally proposed to evaluate results of clustering algorithms, is employed. Moreover, the visual exploration of datasets under analysis enable users to investigate one of the greatest challenges in data classification: the presence of intra-class variation. The results strongly suggest that our approach can be successfully employed as a guidance to defining and understanding a set of features that properly represents an image dataset Feature space evaluation Feature space visualization Visual feature space analysis Visual feature space exploration
3	Exploração visual do espaço de características: uma abordagem para análise de imagens via projeção de dados multidimensionais / Visual feature space exploration: an approach to image analysis via multidimensional data projection Bruno Brandoli Machado 13 December 2010 (has links) Sistemas para análise de imagens partem da premissa de que o conjunto de dados sob investigação está corretamente representado por características. Entretanto, definir quais características representam apropriadamente um conjunto de dados é uma tarefa desafiadora e exaustiva. Grande parte das técnicas de descrição existentes na literatura, especialmente quando os dados têm alta dimensionalidade, são baseadas puramente em medidas estatísticas ou abordagens baseadas em inteligência artificial, e normalmente são caixas-pretas para os usuários. A abordagem proposta nesta dissertação busca abrir esta caixa-preta por meio de representações visuais criadas pela técnica Multidimensional Classical Scaling, permitindo que usuários capturem interativamente a essência sobre a representatividade das características computadas de diferentes descritores. A abordagem é avaliada sobre seis conjuntos de imagens que contém texturas, imagens médicas e cenas naturais. Os experimentos mostram que, conforme a combinação de um conjunto de características melhora a qualidade da representação visual, a acurácia de classificação também melhora. A qualidade das representações é medida pelo índice da silhueta, superando problemas relacionados com a subjetividade de conclusões baseadas puramente em análise visual. Além disso, a capacidade de exploração visual do conjunto sob análise permite que usuários investiguem um dos maiores desafios em classificação de dados: a presença de variação intra-classe. Os resultados sugerem fortemente que esta abordagem pode ser empregada com sucesso como um guia para auxiliar especialistas a explorar, refinar e definir as características que representam apropriadamente um conjunto de imagens / Image analysis systems rely on the fact that the dataset under investigation is correctly represented by features. However, defining a set of features that properly represents a dataset is still a challenging and, in most cases, an exhausting task. Most of the available techniques, especially when a large number of features is considered, are based on purely quantitative statistical measures or approaches based on artificial intelligence, and normally are black-boxes to the user. The approach proposed here seeks to open this black-box by means of visual representations via Multidimensional Classical Scaling projection technique, enabling users to get insight about the meaning and representativeness of the features computed from different feature extraction algorithms and sets of parameters. This approach is evaluated over six image datasets that contains textures, medical images and outdoor scenes. The results show that, as the combination of sets of features and changes in parameters improves the quality of the visual representation, the accuracy of the classification for the computed features also improves. In order to reduce this subjectiveness, a measure called silhouette index, which was originally proposed to evaluate results of clustering algorithms, is employed. Moreover, the visual exploration of datasets under analysis enable users to investigate one of the greatest challenges in data classification: the presence of intra-class variation. The results strongly suggest that our approach can be successfully employed as a guidance to defining and understanding a set of features that properly represents an image dataset Feature space evaluation Feature space visualization Visual feature space analysis Visual feature space exploration
4	Vizualizace konceptů pomocí generování obrazu / Towards concept visualization through image generation Nguyen, Tien Dat January 2016 (has links) Title: Toward concept visualization through image generation Author: Tien Dat Nguyen Department: Institute of Formal and Applied Linguistics Supervisors: Pavel Pecina (Charles University in Prague), Angeliki Lazaridou, Raffaella Bernardi, Marco Baroni (University of Trento), Abstract: Computational linguistic and computer vision have a common way to embed the semantics of linguistic/visual units through vector representation. In addition, high-quality semantic representations can be effectively constructed thanks to recent advances in neural network methods. Nevertheless, the under- standing of these representations remains limited, so they need to be assessed in an intuitive way. Cross-modal mapping is mapping between vector semantic embedding of words and the visual representations of the corresponding objects from images. Inverting image representation involves learning an image inversion of visual vectors (SIFT, HOG and CNN features) to reconstruct the original one. The goal of this project is to build a complete pipeline, in which word represen- tations are transformed into image vectors using cross modal mapping and these vectors are projected to pixel space using inversion. This suggests that there might be a groundbreaking way to inspect and evaluate the semantics encoded in word representations by...
5	Learning Visual Feature Hierarchies Scalzo, Fabien 04 December 2007 (has links) Cette thèse porte sur la reconnaissance visuelle d'objets, un domaine qui reste un défi majeur en vision par ordinateur. En effet, malgré plus de vingt années de recherche, de nombreuses facettes du problème restent a ce jour irrésolues. La conception d'un système de reconnaissance d'objets repose essentiellement sur trois aspects: la représentation, la détection et l'apprentissage automatique. La principale contribution de cette thèse est de proposer un système générique pour la représentation statistique des caractéristiques visuelles et leur détection dans les images. Le modèle proposé combine différents concepts récemment proposés en vision par ordinateur, machine learning et neurosciences: a savoir les relations spatiales entre des caractéristiques visuelles, les modèles graphiques ainsi que les hiérarchies de cellules complexes. Le résultat de cette association prend la forme d'une hiérarchie de classes de caractéristiques visuelles. Son principal intérêt est de fournir un modèle représentant, à la fois, les aspects visuels locaux et globaux, en utilisant la structure géométrique et l'apparence des objets. L'exploitation des modèles graphiques offre un cadre probabiliste pour la représentation des hiérarchies et leur utilisation pour l'inférence. Un algorithme d'échange de messages récemment proposé (NBP) est utilisé pour inférer la position des caractéristiques dans les images. Lors de l'apprentissage, les hiérarchies sont construites de manière incrémentale en partant des caractéristiques de bas-niveaux. L'algorithme est basé sur l'analyse des co-occurrences. Il permet d'estimer la structure et les paramètres des hiérarchies. Les performances offertes par ce nouveau système sont évaluées sur différentes bases de données d'objets de difficulté croissante. Par ailleurs, un survol de l'état de l'art concernant les méthodes de reconnaissances d'objets et les détecteurs de caractéristiques offre une vue globale du domaine. structural learning learning pattern recognition visual feature representation object recognition computer vision
6	Content-based search and browsing in semantic multimedia retrieval Rautiainen, M. (Mika) 04 December 2006 (has links) Abstract Growth in storage capacity has led to large digital video repositories and complicated the discovery of specific information without the laborious manual annotation of data. The research focuses on creating a retrieval system that is ultimately independent of manual work. To retrieve relevant content, the semantic gap between the searcher's information need and the content data has to be overcome using content-based technology. Semantic gap constitutes of two distinct elements: the ambiguity of the true information need and the equivocalness of digital video data. The research problem of this thesis is: what computational content-based models for retrieval increase the effectiveness of the semantic retrieval of digital video? The hypothesis is that semantic search performance can be improved using pattern recognition, data abstraction and clustering techniques jointly with human interaction through manually created queries and visual browsing. The results of this thesis are composed of: an evaluation of two perceptually oriented colour spaces with details on the applicability of the HSV and CIE Lab spaces for low-level feature extraction; the development and evaluation of low-level visual features in example-based retrieval for image and video databases; the development and evaluation of a generic model for simple and efficient concept detection from video sequences with good detection performance on large video corpuses; the development of combination techniques for multi-modal visual, concept and lexical retrieval; the development of a cluster-temporal browsing model as a data navigation tool and its evaluation in several large and heterogeneous collections containing an assortment of video from educational and historical recordings to contemporary broadcast news, commercials and a multilingual television broadcast. The methods introduced here have been found to facilitate semantic queries for novice users without laborious manual annotation. Cluster-temporal browsing was found to outperform the conventional approach, which constitutes of sequential queries and relevance feedback, in semantic video retrieval by a statistically significant proportion. content-based interactive browsing multi-modal search semantic concept video retrieval visual feature
7	Sentiment Analysis on Multi-view Social Data Niu, Teng January 2016 (has links) With the proliferation of social networks, people are likely to share their opinions about news, social events and products on the Web. There is an increasing interest in understanding users’ attitude or sentiment from the large repository of opinion-rich data on the Web. This can beneﬁt many commercial and political applications. Primarily, the researchers concentrated on the documents such as users’ comments on the purchased products. Recent works show that visual appearance also conveys rich human affection that can be predicted. While great efforts have been devoted on the single media, either text or image, little attempts are paid for the joint analysis of multi-view data which is becoming a prevalent form in the social media. For example, paired with the posted textual messages on Twitter, users are likely to upload images and videos which may carry their affective states. One common obstacle is the lack of sufficient manually annotated instances for model learning and performance evaluation. To prompt the researches on this problem, we introduce a multi-view sentiment analysis dataset (MVSA) including a set of manually annotated image-text pairs collected from Twitter. The dataset can be utilized as a valuable benchmark for both single-view and multi-view sentiment analysis. In this thesis, we further conduct a comprehensive study on computational analysis of sentiment from the multi-view data. The state-of-the-art approaches on single view (image or text) or multi view (image and text) data are introduced, and compared through extensive experiments conducted on our constructed dataset and other public datasets. More importantly, the effectiveness of the correlation between different views is also studied using the widely used fusion strategies and advanced multi-view feature extraction methods. Sentiment analysis social media multi-view data textual feature visual feature joint feature learning
8	Feature selection through visualisation for the classification of online reviews Koka, Keerthika 17 April 2017 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The purpose of this work is to prove that the visualization is at least as powerful as the best automatic feature selection algorithms. This is achieved by applying our visualization technique to the online review classification into fake and genuine reviews. Our technique uses radial chart and color overlaps to explore the best feature selection through visualization for classification. Every review is treated as a radial translucent red or blue membrane with its dimensions determining the shape of the membrane. This work also shows how the dimension ordering and combination is relevant in the feature selection process. In brief, the whole idea is about giving a structure to each text review based on certain attributes, comparing how different or how similar the structure of the different or same categories are and highlighting the key features that contribute to the classification the most. Colors and saturations aid in the feature selection process. Our visualization technique helps the user get insights into the high dimensional data by providing means to eliminate the worst features right away, pick some best features without statistical aids, understand the behavior of the dimensions in different combinations. Text Visual analytics Data visualisation Online reviews classification Multi-dimensional data visualisation Visual feature selection
9	Lipreading across multiple views Lucey, Patrick Joseph January 2007 (has links) Visual information from a speaker's mouth region is known to improve automatic speech recognition (ASR) robustness, especially in the presence of acoustic noise. Currently, the vast majority of audio-visual ASR (AVASR) studies assume frontal images of the speaker's face, which is a rather restrictive human-computer interaction (HCI) scenario. The lack of research into AVASR across multiple views has been dictated by the lack of large corpora that contains varying pose/viewpoint speech data. Recently, research has concentrated on recognising human be- haviours within &quotmeeting " or &quotlecture " type scenarios via &quotsmart-rooms ". This has resulted in the collection of audio-visual speech data which allows for the recognition of visual speech from both frontal and non-frontal views to occur. Using this data, the main focus of this thesis was to investigate and develop various methods within the confines of a lipreading system which can recognise visual speech across multiple views. This reseach constitutes the first published work within the field which looks at this particular aspect of AVASR. The task of recognising visual speech from non-frontal views (i.e. profile) is in principle very similar to that of frontal views, requiring the lipreading system to initially locate and track the mouth region and subsequently extract visual features. However, this task is far more complicated than the frontal case, because the facial features required to locate and track the mouth lie in a much more limited spatial plane. Nevertheless, accurate mouth region tracking can be achieved by employing techniques similar to frontal facial feature localisation. Once the mouth region has been extracted, the same visual feature extraction process can take place to the frontal view. A novel contribution of this thesis, is to quantify the degradation in lipreading performance between the frontal and profile views. In addition to this, novel patch-based analysis of the various views is conducted, and as a result a novel multi-stream patch-based representation is formulated. Having a lipreading system which can recognise visual speech from both frontal and profile views is a novel contribution to the field of AVASR. How- ever, given both the frontal and profile viewpoints, this begs the question, is there any benefit of having the additional viewpoint? Another major contribution of this thesis, is an exploration of a novel multi-view lipreading system. This system shows that there does exist complimentary information in the additional viewpoint (possibly that of lip protrusion), with superior performance achieved in the multi-view system compared to the frontal-only system. Even though having a multi-view lipreading system which can recognise visual speech from both front and profile views is very beneficial, it can hardly considered to be realistic, as each particular viewpoint is dedicated to a single pose (i.e. front or profile). In an effort to make the lipreading system more realistic, a unified system based on a single camera was developed which enables a lipreading system to recognise visual speech from both frontal and profile poses. This is called pose-invariant lipreading. Pose-invariant lipreading can be performed on either stationary or continuous tasks. Methods which effectively normalise the various poses into a single pose were investigated for the stationary scenario and in another contribution of this thesis, an algorithm based on regularised linear regression was employed to project all the visual speech features into a uniform pose. This particular method is shown to be beneficial when the lipreading system was biased towards the dominant pose (i.e. frontal). The final contribution of this thesis is the formulation of a continuous pose-invariant lipreading system which contains a pose-estimator at the start of the visual front-end. This system highlights the complexity of developing such a system, as introducing more flexibility within the lipreading system invariability means the introduction of more error. All the works contained in this thesis present novel and innovative contributions to the field of AVASR, and hopefully this will aid in the future deployment of an AVASR system in realistic scenarios. lipreading frontal pose profile pose multi-view visual front-end visual feature extraction pose-invariance multi-stream fusion
10	Visual exploration to support the identification of relevant attributes in time-varying multivariate data / Visualização como apoio à identificação de atributos relevantes em dados multidimensionais variantes no tempo Vargas, Aurea Rossy Soriano 19 March 2018 (has links) Ionospheric scintillation is a rapid variation in the amplitude and/or phase of radio signals traveling through the ionosphere. This spatial and time-varying phenomenon is of interest because its occurrence may affect the reception quality of satellite signals. Specialized receivers at strategic regions can track multiple variables related to the phenomenon, generating a database of historical observations on the regional behavior of ionospheric scintillation. The analysis of such data is very challenging, since it consists of time-varying measurements of many variables which are heterogeneous in nature and with possibly many missing values, recorded over extensive time periods. There is a need to introduce alternative intuitive strategies that contribute to experts acquiring further knowledge from the ionospheric scintillation data. Such challenges motivated a study on the applicability of visualization techniques to support tasks of identification of relevant attributes in the study of the behavior of phenomena described by multiple time-varying variables, of which the ionospheric scintillation is a good example. In particular, this thesis introduces a visual analytics framework, named TV-MV Analytics, that supports exploratory tasks on time-varying multivariate data and was developed following the requirements of experts on ionospheric scintillation from the Faculty of Science and Technology of UNESP at Presidente Prudente, Brazil. TV-MV Analytics provides an interactive visual exploration loop to analysts inspecting the behavior of multiple variables at different temporal scales, through temporal representations associated with clustering and multidimensional projection techniques. Analysts can also assess how different feature sub-spaces contribute to characterizing a certain behavior, where they may direct the analysis process and include their domain knowledge in the exploratory analysis. We also illustrate the application of TV-MV Analytics on multivariate time-varying data sets from three alternative application domains. Experimental results indicate the proposed solutions show good potential on assisting time-varying multivariate data mining tasks, since it reduces the effort required from experts to gain deeper insight into the historical behavior of the variables describing a phenomenon or domain. / A cintilação ionosférica é uma variação rápida na amplitude e/ou na fase dos sinais de rádio que viajam através da ionosfera. Este fenômeno espacial e variante no tempo é de grande interesse, pois pode afetar a qualidade de recepção dos sinais de satélite. Receptores especializados em regiões estratégicas podem rastrear múltiplas variáveis relacionadas ao fenômeno, gerando um banco de dados de observações históricas sobre o comportamento regional da cintilação. O estudo do comportamento da cintilação é desafiador, uma vez que requer a análise extensiva de dados multivariados e variantes no tempo, coletados por longos períodos. Medições são registradas continuamente, e são de natureza heterogênea, compreendendo múltiplas variáveis de diferentes categorias e possivelmente com muitos valores faltantes. Portanto, existe a necessidade de introduzir estratégias alternativas, eficientes e intuitivas, que contribuam para a adquisição de conhecimento, a partir dos dados, por especialistas que estudam a cintilação ionosférica. Tais desafios motivaram o estudo da aplicabilidade de técnicas de visualização para apoiar tarefas de identificação de atributos relevantes no estudo do comportamento de fenômenos ou domínios que envolvem múltiplas variáveis, como a cintilação. Em particular, esta tese introduz um arcabouço visual, o qual foi denominado TV-MV Analytics, que apoia tarefas de análise exploratória sobre dados multivariados e variáveis no tempo, inspirado em requisitos de especialistas no estudo da cintilação, vinculados à Faculdade de Ciências e Tecnologia da UNESP de Presidente Prudente, Brasil. O TV-MV Analytics fornece aos analistas um ciclo de interativo de exploração que apoia a inspeção do comportamento temporal de múltiplas variáveis, em diferentes escalas temporais, por meio de representações visuais temporais associadas a técnicas de agrupamento e de projeção multidimensional. Também permite avaliar como diferentes sub-espaços de atributos caracterizam um determinado comportamento, podendo direcionar o processo de análise e inserir seu conhecimento do domínio no processo de análise exploratória. As funcionalidades do TV-MV Analytics também são ilustradas em dados variantes no tempo oriundos de outros três domínios de aplicação. Os resultados experimentais indicaram que as soluções propostas têm bom potencial em tarefas de mineração de dados multivariados e variantes no tempo, uma vez que reduz o esforço e contribui para os especialistas obterem informações detalhadas sobre o comportamento histórico das variáveis que descrevem um determinado fenômeno ou domínio. Análise de espaço de atributos Análise visual Dados multivariados e variantes no tempo Exploratory data visualization Feature space analysis Seleção visual de atributos Time-varying multivariate data Visual analytics Visual feature selection Visualização exploratória de dados

Search results