Spelling suggestions: "subject:"image annotation."" "subject:"lmage annotation.""
21 |
Investigating the relationship between the distribution of local semantic concepts and local keypoints for image annotationAlqasrawi, Yousef T. N., Neagu, Daniel January 2014 (has links)
No / The problem of image annotation has gained increasing attention from many researchers in computer vision. Few works have addressed the use of bag of visual words for scene annotation at region level. The aim of this paper is to study the relationship between the distribution of local semantic concepts and local keypoints located in image regions labelled with these semantic concepts. Based on this study, we investigate whether bag of visual words model can be used to efficiently represent the content of natural scene image regions, so images can be annotated with local semantic concepts. Also, this paper presents local from global approach which study the influence of using visual vocabularies generated from general scene categories to build bag of visual words at region level. Extensive experiments are conducted over a natural scene dataset with six categories. The reported results have shown the plausibility of using the BOW model to represent the semantic information of image regions.
|
22 |
Extension automatique de l'annotation d'images pour la recherche et la classification / Automatic image annotation extension for search and classificationBouzayani, Abdessalem 09 May 2018 (has links)
Cette thèse traite le problème d’extension d’annotation d’images. En effet, la croissance rapide des archives de contenus visuels disponibles a engendré un besoin en techniques d’indexation et de recherche d’information multimédia. L’annotation d’images permet l’indexation et la recherche dans des grandes collections d’images d’une façon facile et rapide. À partir de bases d’images partiellement annotées manuellement, nous souhaitons compléter les annotations de ces bases, grâce à l’annotation automatique, pour pouvoir rendre plus efficaces les méthodes de recherche et/ou classification d’images. Pour l’extension automatique d’annotation d’images, nous avons utilisé les modèles graphiques probabilistes. Le modèle proposé est un mélange de distributions multinomiales et de mélanges de Gaussiennes où nous avons combiné des caractéristiques visuelles et textuelles. Pour réduire le coût de l’annotation manuelle et améliorer la qualité de l’annotation obtenue, nous avons intégré des retours utilisateur dans notre modèle. Les retours utilisateur ont été effectués en utilisant l’apprentissage dans l’apprentissage, l’apprentissage incrémental et l’apprentissage actif. Pour combler le problème du fossé sémantique et enrichir l’annotation d’images, nous avons utilisé une hiérarchie sémantique en modélisant de nombreuses relations sémantiques entre les mots-clés d’annotation. Nous avons donc présenté une méthode semi-automatique pour construire une hiérarchie sémantique à partie d’un ensemble de mots-clés. Après la construction de la hiérarchie, nous l’avons intégré dans notre modèle d’annotation d’images. Le modèle obtenu avec la hiérarchie est un mélange de distributions de Bernoulli et de mélanges de Gaussiennes / This thesis deals the problem of image annotation extension. Indeed, the fast growth of available visual contents has led a need for indexing and searching of multimedia information methods. Image annotation allows indexing and searching in a large collection of images in an easy and fast way. We wish, from partially manually annotated images databases, complete automatically the annotation of these sets, in order to make methods of research and / or classification of images more efficient. For automatic image annotation extension, we use probabilistic graphical models. The proposed model is based on a mixture of multinomial distributions and mixtures of Gaussian where we have combined visual and textual characteristics. To reduce the cost of manual annotation and improve the quality of the annotation obtained, we have incorporated user feedback into our model. User feedback was done using learning in learning, incremental learning and active learning. To reduce the semantic gap problem and to enrich the image annotation, we use a semantic hierarchy by modeling many semantic relationships between keywords. We present a semi-automatic method to build a semantic hierarchy from a set of keywords. After building the hierarchy, we integrate it into our image annotation model. The model obtained with this hierarchy is a mixture of Bernoulli distributions and Gaussian mixtures
|
23 |
Analysis and measurement of visuospatial complexityAl Saleh, Alissar January 2023 (has links)
The thesis performs an analysis on visuospatial complexity of dynamic scenes, and morespecifically driving scenes in the propose of gaining a knowledge on human visual perception of the visual information present in a typical driving scene. The analysis and measurement of visual complexity is performed by utilizing two different measure modelsfor measuring visual clutter, Feature congestion clutter measure [1] and Subband entropyclutter measure[1] introduced by Rosenholtz, a cognitive science and research. The thesisrepresent the performance of the computational models on a data set consisting of sixepisodes that simulate driving scenes with different settings and combination of visualfeatures. The results of evaluating the measure models are used to introduce a formulafor measuring visual complexity of annotated images by extracting valuable informationfrom the annotated data set using Scalabel[2], an annotation web- based open source tool.
|
24 |
Représentations visuelles de concepts textuels pour la recherche et l'annotation interactives d'images / Keyword visual representation for interactive image retrieval and image annotationNguyen, Nhu Van 09 September 2011 (has links)
En recherche d'images aujourd'hui, nous manipulons souvent de grands volumes d'images, qui peuvent varier ou même arriver en continu. Dans une base d'images, on se retrouve ainsi avec certaines images anciennes et d'autres nouvelles, les premières déjà indexées et possiblement annotées et les secondes en attente d'indexation ou d'annotation. Comme la base n'est pas annotée uniformément, cela rend l'accès difficile par le biais de requêtes textuelles. Nous présentons dans ce travail différentes techniques pour interagir, naviguer et rechercher dans ce type de bases d'images. Premièrement, un modèle d'interaction à court terme est utilisé pour améliorer la précision du système. Deuxièmement, en se basant sur un modèle d'interaction à long terme, nous proposons d'associer mots textuels et caractéristiques visuelles pour la recherche d'images par le texte, par le contenu visuel, ou mixte texte/visuel. Ce modèle de recherche d'images permet de raffiner itérativement l'annotation et la connaissance des images. Nous identifions quatre contributions dans ce travail. La première contribution est un système de recherche multimodale d'images qui intègre différentes sources de données, comme le contenu de l'image et le texte. Ce système permet l'interrogation par l'image, l'interrogation par mot-clé ou encore l'utilisation de requêtes hybrides. La deuxième contribution est une nouvelle technique pour le retour de pertinence combinant deux techniques classiques utilisées largement dans la recherche d'information~: le mouvement du point de requête et l'extension de requêtes. En profitant des images non pertinentes et des avantages de ces deux techniques classiques, notre méthode donne de très bons résultats pour une recherche interactive d'images efficace. La troisième contribution est un modèle nommé "Sacs de KVR" (Keyword Visual Representation) créant des liens entre des concepts sémantiques et des représentations visuelles, en appui sur le modèle de Sac de Mots. Grâce à une stratégie d'apprentissage incrémental, ce modèle fournit l'association entre concepts sémantiques et caractéristiques visuelles, ce qui contribue à améliorer la précision de l'annotation sur l'image et la performance de recherche. La quatrième contribution est un mécanisme de construction incrémentale des connaissances à partir de zéro. Nous ne séparons pas les phases d'annotation et de recherche, et l'utilisateur peut ainsi faire des requêtes dès la mise en route du système, tout en laissant le système apprendre au fur et à mesure de son utilisation. Les contributions ci-dessus sont complétées par une interface permettant la visualisation et l'interrogation mixte textuelle/visuelle. Même si pour l'instant deux types d'informations seulement sont utilisées, soit le texte et le contenu visuel, la généricité du modèle proposé permet son extension vers d'autres types d'informations externes à l'image, comme la localisation (GPS) et le temps. / As regard image retrieval today, we often manipulate large volumes of images, which may vary or even update continuously. In an image database, we end up with both old and new images, the first possibly already indexed and annotated and the latter waiting for indexing or annotation. Since the database is not annotated consistently, it is difficult to use text queries. We present in this work different techniques to interact, navigate and search in this type of image databases. First, a model for short term interaction is used to improve the accuracy of the system. Second, based on a model of long terminteraction, we propose to combine semantic concepts and visual features to search for images by text, visual content or a mix between text and visual content. This model of image retrieval can iteratively refine the annotation of images.We identify four contributions in this work. The first contribution is a system for multimodal retrieval of images which includes different kinds of data, like visual content and text. This system can be queried by images, by keywords or by hybrid text/visual queries. The second contribution is a novel technique of relevance feedback combining 2 classic techniques: query point movement and query expansion. This technique profits for non-pertinent feedback and combines the advantages of both classic techniques and improve performance for interactive image retrieval. The third contribution is a model based on visual representations of keywords (KVR: Keyword Visual Representation) that create links between textand visual content, based on long term interaction. With the strategy of incremental learning, this model provides an association between semantic concepts and visual features that help improve the accuracy of image annotation and image retrieval. Moreover, the visual representation of textual concept gives users the ability to query the system by text queries or mixed queries text / images, even if the image database is only partially annotated. The fourth contribution, under the assumption that knowledge is not available early in most image retrieval systems, is a mechanism for incremental construction of knowledge from scratch. We do not separate phases of retrieval and annotation, and the user can makequeries from the start of the system, while allowing the system to learn incrementally when it is used. The contributions above are completed by an interface for viewing and querying mixing textual and visual content. Although at present only two types of information are used, the text and visual content, the genericity of the proposed model allows its extension to other types of external information, such as location (GPS) and time.
|
25 |
Handling Imperfections for Multimodal Image AnnotationZnaidia, Amel 11 February 2014 (has links) (PDF)
This thesis deals with multimodal image annotation in the context of social media. We seek to take advantage of textual (tags) and visual information in order to enhance the image annotation performances. However, these tags are often noisy, overly personalized and only a few of them are related to the semantic visual content of the image. In addition, when combining prediction scores from different classifiers learned on different modalities, multimodal image annotation faces their imperfections (uncertainty, imprecision and incompleteness). Consequently, we consider that multimodal image annotation is subject to imperfections at two levels: the representation and the decision. Inspired from the information fusion theory, we focus in this thesis on defining, identifying and handling imperfection aspects in order to improve image annotation.
|
26 |
A Framework for Fashion Data Gathering, Hierarchical-Annotation and Analysis for Social Media and Online Shop : TOOLKIT FOR DETAILED STYLE ANNOTATIONS FOR ENHANCED FASHION RECOMMENDATIONWara, Ummul January 2018 (has links)
Due to the transformation of different recommendation system from contentbased to hybrid cross-domain-based, there is an urge to prepare a socialnetwork dataset which will provide sufficient data as well as detail-level annotation from a predefined hierarchical clothing category and attribute based vocabulary by considering user interactions. However, existing fashionbased datasets lack either in hierarchical-category based representation or user interactions of social network. The thesis intends to represent two datasets- one from photo-sharing platform Instagram which gathers fashionistas images with all possible user-interactions and another from online-shop Zalando with every cloths detail. We present a design of a customized crawler that enables the user to crawl data based on category or attributes. Moreover, an efficient and collaborative web-solution is designed and implemented to facilitate large-scale hierarchical category-based detaillevel annotation of Instagram data. By considering all user-interactions, the developed solution provides a detail-level annotation facility that reflects the user’s preference. The web-solution is evaluated by the team as well as the Amazon Turk Service. The annotated output from different users proofs the usability of the web-solution in terms of availability and clarity. In addition to data crawling and annotation web-solution development, this project analyzes the Instagram and Zalando data distribution in terms of cloth category, subcategory and pattern to provide meaningful insight over data. Researcher community will benefit by using these datasets if they intend to work on a rich annotated dataset that represents social network and resembles in-detail cloth information. / Med tanke på trenden inom forskning av rekommendationssystem, där allt fler rekommendationssystem blir hybrida och designade för flera domäner, så finns det ett behov att framställa en datamängd från sociala medier som innehåller detaljerad information om klädkategorier, klädattribut, samt användarinteraktioner. Nuvarande datasets med inriktning mot mode saknar antingen en hierarkisk kategoristruktur eller information om användarinteraktion från sociala nätverk. Detta projekt har syftet att ta fram två dataset, ett dataset som insamlats från fotodelningsplattformen Instagram, som innehåller foton, text och användarinteraktioner från fashionistas, samt ett dataset som insamlats från klädutbutdet som ges av onlinebutiken Zalando. Vi presenterar designen av en webbcrawler som är anpassad för att kunna hämta data från de nämnda domänerna och är optimiserad för mode och klädattribut. Vi presenterar även en effektiv webblösning som är designad och implementerad för att möjliggöra annotering av stora mängder data från Instagram med väldigt detaljerad information om kläder. Genom att vi inkluderar användarinteraktioner i applikationen så kan vår webblösning ge användaranpassad annotering av data. Webblösningen har utvärderats av utvecklarna samt genom AmazonTurk tjänsten. Den annoterade datan från olika användare demonstrerar användarvänligheten av webblösningen. Utöver insamling av data och utveckling av ett system för webb-baserad annotering av data så har datadistributionerna i två modedomäner, Instagram och Zalando, analyserats. Datadistributionerna analyserades utifrån klädkategorier och med syftet att ge datainsikter. Forskning inom detta område kan dra nytta av våra resultat och våra datasets. Specifikt så kan våra datasets användas i domäner som kräver information om detaljerad klädinformation och användarinteraktioner.
|
27 |
Fashion Object Detection and Pixel-Wise Semantic Segmentation : Crowdsourcing framework for image bounding box detection & Pixel-Wise SegmentationMallu, Mallu January 2018 (has links)
Technology has revamped every aspect of our life, one of those various facets is fashion industry. Plenty of deep learning architectures are taking shape to augment fashion experiences for everyone. There are numerous possibilities of enhancing the fashion technology with deep learning. One of the key ideas is to generate fashion style and recommendation using artificial intelligence. Likewise, another significant feature is to gather reliable information of fashion trends, which includes analysis of existing fashion related images and data. When specifically dealing with images, localisation and segmentation are well known to address in-depth study relating to pixels, objects and labels present in the image. In this master thesis a complete framework is presented to perform localisation and segmentation on fashionista images. This work is a part of an interesting research work related to Fashion Style detection and Recommendation. Developed solution aims to leverage the possibility of localising fashion items in an image by drawing bounding boxes and labelling them. Along with that, it also provides pixel-wise semantic segmentation functionality which extracts fashion item label-pixel data. Collected data can serve as ground truth as well as training data for the aimed deep learning architecture. A study related to localisation and segmentation of videos has also been presented in this work. The developed system has been evaluated in terms of flexibility, output quality and reliability as compared to similar platforms. It has proven to be fully functional solution capable of providing essential localisation and segmentation services while keeping the core architecture simple and extensible. / Tekniken har förnyat alla aspekter av vårt liv, en av de olika fasetterna är modeindustrin. Massor av djupa inlärningsarkitekturer tar form för att öka modeupplevelser för alla. Det finns många möjligheter att förbättra modetekniken med djup inlärning. En av de viktigaste idéerna är att skapa modestil och rekommendation med hjälp av artificiell intelligens. På samma sätt är en annan viktig egenskap att samla pålitlig information om modetrender, vilket inkluderar analys av befintliga moderelaterade bilder och data. När det specifikt handlar om bilder är lokalisering och segmentering väl kända för att ta itu med en djupgående studie om pixlar, objekt och etiketter som finns i bilden. I denna masterprojekt presenteras en komplett ram för att utföra lokalisering och segmentering på fashionista bilder. Detta arbete är en del av ett intressant forskningsarbete relaterat till Fashion Style detektering och rekommendation. Utvecklad lösning syftar till att utnyttja möjligheten att lokalisera modeartiklar i en bild genom att rita avgränsande lådor och märka dem. Tillsammans med det tillhandahåller det även pixel-wise semantisk segmenteringsfunktionalitet som extraherar dataelementetikett-pixeldata. Samlad data kan fungera som grundsannelse samt träningsdata för den riktade djuplärarkitekturen. En studie relaterad till lokalisering och segmentering av videor har också presenterats i detta arbete. Det utvecklade systemet har utvärderats med avseende på flexibilitet, utskriftskvalitet och tillförlitlighet jämfört med liknande plattformar. Det har visat sig vara en fullt fungerande lösning som kan tillhandahålla viktiga lokaliseringsoch segmenteringstjänster samtidigt som kärnarkitekturen är enkel och utvidgbar.
|
28 |
Automatic Image Annotation by Sharing Labels Based on Image Clustering / Automatisk bildannotering med hjälp av tagg-delning baserat på bildklusteringSpång, Anton January 2017 (has links)
The growth of image collection sizes during the development has currently made manual annotation unfeasible, leading to the need for accurate and time efficient image annotation methods. This project evaluates a system for Automatic Image Annotation to see if it is possible to share annotations between images based on un-supervised clustering. The evaluation of the system included performing experiments with different algorithms and different unlabeled data sets. The system is also compared to an award winning Convolutional Neural Network model, used as a baseline, to see if the system’s precision and/or recall could be better than the baseline model’s. The results of the experiment conducted in this work showed that the precision and recall could be increased on the data used in this thesis, an increase of 0.094 in precision and 0.049 in recall in average for the system compared to the baseline. / Utvecklingen av bildkollektioners storlekar har fram till idag ökat behovet av ett pålitligt och effektivt annoteringsverktyg i och med att manuell annotering har blivit ineffektivt. Denna rapport utvärderar möjligheterna att dela bildtaggar mellan visuellt lika bilder med ett system för automatisk bildannotering baserat på klustring. Utvärderingen sker i form av flera experiment med olika algoritmer och olika omärkta datamängder. I experimenten är systemet jämfört med en prisbelönt konvolutionell neural nätverksmodell, vilken är använd som utgångspunkt, för att undersöka om systemets resultat kan bli bättre än utgångspunktens resultat. Resultaten visar att både precisionen och återkallelsen förbättrades i de experiment som genomfördes på den data använd i detta arbete. En precisionsökning med 0.094 och en återkallelseökning med 0.049 för det implementerade systemet jämfört med utgångspunkten, över det genomförda experimenten.
|
29 |
Alinhamento texto-imagem em sites de notíciasVeltroni, Wellington Cristiano 02 March 2018 (has links)
Submitted by Wellington Veltroni (wellingtonveltroni@gmail.com) on 2018-04-10T20:01:58Z
No. of bitstreams: 4
Disserta__o___Wellington_C__Veltroni-versao_final.pdf: 15387621 bytes, checksum: 73df0490e376ced1e7ca7ae3eb77db60 (MD5)
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5) / Rejected by Eunice Nunes (eunicenunes6@gmail.com), reason: Bom dia Wellington,
Informamos que faltou enviar a Carta comprovante assinada pelo orientador.
Solicite o modelo em sua Secretaria de Pós-graduação, preencha e colete a assinatura com o orientador e acesse novamente o sistema para fazer o Upload.
Fico no aguardo para finalizarmos o processo.
Abraços
Ronildo on 2018-04-17T14:16:27Z (GMT) / Submitted by Wellington Veltroni (wellingtonveltroni@gmail.com) on 2018-04-18T12:15:38Z
No. of bitstreams: 5
Disserta__o___Wellington_C__Veltroni-versao_final.pdf: 15387621 bytes, checksum: 73df0490e376ced1e7ca7ae3eb77db60 (MD5)
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5)
Carta_orientadora.pdf: 386667 bytes, checksum: 7343ef875a0334174a4a5abe3fd73b3e (MD5) / Approved for entry into archive by Eunice Nunes (eunicenunes6@gmail.com) on 2018-04-26T12:03:07Z (GMT) No. of bitstreams: 5
Disserta__o___Wellington_C__Veltroni-versao_final.pdf: 15387621 bytes, checksum: 73df0490e376ced1e7ca7ae3eb77db60 (MD5)
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5)
Carta_orientadora.pdf: 386667 bytes, checksum: 7343ef875a0334174a4a5abe3fd73b3e (MD5) / Rejected by Eunice Nunes (eunicenunes6@gmail.com), reason: Bom dia Wellington,
Verificamos que a folha de aprovação não está inserida em sua dissertação
Tentei colocá-la aqui mas desconfigura todo o trabalho
Poderia por favor enviar um novo arquivo com a folha de aprovação já inserida em sua dissertação
Aguardo o retorno para finalizarmos o processo
Qualquer dúvida estou à disposição
Abraços
Eunice on 2018-05-17T13:43:35Z (GMT) / Submitted by Wellington Veltroni (wellingtonveltroni@gmail.com) on 2018-05-28T14:50:36Z
No. of bitstreams: 5
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5)
Carta_orientadora.pdf: 386667 bytes, checksum: 7343ef875a0334174a4a5abe3fd73b3e (MD5)
Disserta__o___Wellington_C__Veltroni_c_folha_aprovacao.pdf: 15849017 bytes, checksum: 2884d766f705fbe38907a8f56724796e (MD5) / Approved for entry into archive by Ronildo Prado (ri.bco@ufscar.br) on 2018-06-06T13:12:16Z (GMT) No. of bitstreams: 5
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5)
Carta_orientadora.pdf: 386667 bytes, checksum: 7343ef875a0334174a4a5abe3fd73b3e (MD5)
Disserta__o___Wellington_C__Veltroni_c_folha_aprovacao.pdf: 15849017 bytes, checksum: 2884d766f705fbe38907a8f56724796e (MD5) / Approved for entry into archive by Ronildo Prado (ri.bco@ufscar.br) on 2018-06-06T13:12:26Z (GMT) No. of bitstreams: 5
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5)
Carta_orientadora.pdf: 386667 bytes, checksum: 7343ef875a0334174a4a5abe3fd73b3e (MD5)
Disserta__o___Wellington_C__Veltroni_c_folha_aprovacao.pdf: 15849017 bytes, checksum: 2884d766f705fbe38907a8f56724796e (MD5) / Made available in DSpace on 2018-06-06T13:16:03Z (GMT). No. of bitstreams: 5
FA_vELTRONI.jpg: 460462 bytes, checksum: 790b1993aa2dca9f252dac391edb16d3 (MD5)
rd_vELTRONI.jpg: 817039 bytes, checksum: 9dca57906ec66fb968b138749edaf787 (MD5)
tc_vELTRONI.jpg: 581847 bytes, checksum: bcc9d308881a36fbccfb47fe2349b222 (MD5)
Carta_orientadora.pdf: 386667 bytes, checksum: 7343ef875a0334174a4a5abe3fd73b3e (MD5)
Disserta__o___Wellington_C__Veltroni_c_folha_aprovacao.pdf: 15849017 bytes, checksum: 2884d766f705fbe38907a8f56724796e (MD5)
Previous issue date: 2018-03-02 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Text-image alignment is the task of aligning elements in a text with elements in the image accompanying it. In this work the text-image alignment was applied in news sites. A lot of news do not make clear the correspondence between elements of a text and elements within the associated image. In this scenario, text-image alignment arises with the intention of guiding the reader, bringing clarity to the news and associated image since it explicitly explains the direct correspondence between regions of the image and words (or named entities) in the text. The goal of this work is to combine Natural Language Processing (NLP) and Computer Vision (CV) techniques to generate a text-image alignment for news: the LinkPICS aligner. LinkPICS uses the YOLO convolutional network (CNN) to detect people and objects in the image associated with the news text. Due to the limitation of the number of objects detected by YOLO (only 80 classes), we decided to use three other CNNs to generate new labels for detected objects. In this work, the text-image alignment was divided into two distinct processes: (1) people alignment and (2) objects alignment. In people alignment, the named entities identified in the text are aligned with images of people. In the evaluation performed with the Folha de São Paulo International news corpus, in English, LinkPICS obtained an accuracy of 98% precision. For the objects alignment, the physical words are aligned with objects (or animals, fruits, etc.) present in the image associated with the news. In the evaluation performed with the news corpus of BBC NEWS, also in English, LinkPICS achieved 72% precision. The main contributions of this work are the LinkPICS aligner and the proposed strategy for its implementation, which represent innovations for the NLP and CV areas. In addition to these, another contribution of this work is the possibility of generating a visual dictionary (words associated with images) containing people and objects aligned, which can be used in other researches and applications such as helping to learn a second language. / O alinhamento texto-imagem é a tarefa de alinhar elementos presentes em um texto com elementos presentes na imagem que o acompanha. Neste trabalho, o alinhamento texto-imagem foi aplicado em sites de notícias. Muitas notícias não deixam clara para o leitor a correspondência entre elementos do texto e elementos contidos na imagem associada. Nesse cenário, o alinhamento texto-imagem surge com a intenção de orientar o leitor, trazendo clareza para a notícia e a imagem associada uma vez que explicita a correspondência direta entre regiões da imagem e palavras (ou entidades) no texto. O objetivo deste trabalho é combinar técnicas de Processamento de Linguagem Natural (PLN) e Visão Computacional (VC) para gerar um alinhador texto-imagem para notícias: o alinhador LinkPICS. O LinkPICS utiliza a rede convolucional (CNN) YOLO para detectar pessoas e objetos na imagem associada ao texto da notícia. Devido à limitação do número de objetos detectados pela YOLO (80 classes de objetos), optou-se também pela utilização de outras três CNNs para a geração de novos rótulos para objetos. Neste trabalho, o alinhamento texto-imagem foi dividido em dois processos distintos: (1) o alinhamento de pessoas e (2) o alinhamento de objetos. No alinhamento de pessoas, as entidades nomeadas são alinhadas com imagens de pessoas e na avaliação realizada no córpus de notícias da Folha de São Paulo Internacional, em inglês, obteve-se uma precisão de 98%. No alinhamento de objetos, as palavras físicas são alinhadas com objetos (ou animais, frutas, etc.) contidos na imagem associada à notícia e na avaliação realizada no córpus de notícias da BBC NEWS, também em inglês, obteve-se uma precisão de 72%. As principais contribuições deste trabalho são o alinhador LinkPICS e a estratégia proposta para sua implementação, que representam inovações para as áreas de PLN e VC. Além destas, outra contribuição deste trabalho é a possibilidade de geração de um dicionário visual (palavras associadas a imagens) contendo pessoas e objetos alinhados, que poderá ser utilizado em outras pesquisas e aplicações como, por exemplo, no auxílio ao aprendizado de outro idioma. / CNPQ: 133679/2015-2
|
Page generated in 0.0905 seconds