1 |
Automatic caption generation for news imagesFeng, Yansong January 2011 (has links)
This thesis is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Automatic description generation for video frames would help security authorities manage more efficiently and utilize large volumes of monitoring data. Image search engines could potentially benefit from image description in supporting more accurate and targeted queries for end users. Importantly, generating image descriptions would aid blind or partially sighted people who cannot access visual information in the same way as sighted people can. However, previous work has relied on fine-gained resources, manually created for specific domains and applications In this thesis, we explore the feasibility of automatic caption generation for news images in a knowledge-lean way. We depart from previous work, as we learn a model of caption generation from publicly available data that has not been explicitly labelled for our task. The model consists of two components, namely extracting image content and rendering it in natural language. Specifically, we exploit data resources where images and their textual descriptions co-occur naturally. We present a new dataset consisting of news articles, images, and their captions that we required from the BBC News website. Rather than laboriously annotating images with keywords, we simply treat the captions as the labels. We show that it is possible to learn the visual and textual correspondence under such noisy conditions by extending an existing generative annotation model (Lavrenko et al., 2003). We also find that the accompanying news documents substantially complements the extraction of the image content. In order to provide a better modelling and representation of image content,We propose a probabilistic image annotation model that exploits the synergy between visual and textual modalities under the assumption that images and their textual descriptions are generated by a shared set of latent variables (topics). Using Latent Dirichlet Allocation (Blei and Jordan, 2003), we represent visual and textual modalities jointly as a probability distribution over a set of topics. Our model takes these topic distributions into account while finding the most likely keywords for an image and its associated document. The availability of news documents in our dataset allows us to perform the caption generation task in a fashion akin to text summarization; save one important difference that our model is not solely based on text but uses the image in order to select content from the document that should be present in the caption. We propose both extractive and abstractive caption generation models to render the extracted image content in natural language without relying on rich knowledge resources, sentence-templates or grammars. The backbone for both approaches is our topic-based image annotation model. Our extractive models examine how to best select sentences that overlap in content with our image annotation model. We modify an existing abstractive headline generation model to our scenario by incorporating visual information. Our own model operates over image description keywords and document phrases by taking dependency and word order constraints into account. Experimental results show that both approaches can generate human-readable captions for news images. Our phrase-based abstractive model manages to yield as informative captions as those written by the BBC journalists.
|
2 |
VIDEO SCENE DETECTION USING CLOSED CAPTION TEXTSmith, Gregory 15 May 2009 (has links)
Issues in Automatic Video Biography Editing are similar to those in Video Scene Detection and Topic Detection and Tracking (TDT). The techniques of Video Scene Detection and TDT can be applied to interviews to reduce the time necessary to edit a video biography. The system has attacked the problems of extraction of video text, story segmentation, and correlation. This thesis project was divided into three parts: extraction, scene detection, and correlation. The project successfully detected scene breaks in series television episodes and displayed scenes that had similar content.
|
3 |
Leveraging Multimodal Perspectives to Learn Common Sense for Vision and Language TasksLin, Xiao 05 October 2017 (has links)
Learning and reasoning with common sense is a challenging problem in Artificial Intelligence (AI). Humans have the remarkable ability to interpret images and text from different perspectives in multiple modalities, and to use large amounts of commonsense knowledge while performing visual or textual tasks. Inspired by that ability, we approach commonsense learning as leveraging perspectives from multiple modalities for images and text in the context of vision and language tasks.
Given a target task (e.g., textual reasoning, matching images with captions), our system first represents input images and text in multiple modalities (e.g., vision, text, abstract scenes and facts). Those modalities provide different perspectives to interpret the input images and text. And then based on those perspectives, the system performs reasoning to make a joint prediction for the target task. Surprisingly, we show that interpreting textual assertions and scene descriptions in the modality of abstract scenes improves performance on various textual reasoning tasks, and interpreting images in the modality of Visual Question Answering improves performance on caption retrieval, which is a visual reasoning task. With grounding, imagination and question-answering approaches to interpret images and text in different modalities, we show that learning commonsense knowledge from multiple modalities effectively improves the performance of downstream vision and language tasks, improves interpretability of the model and is able to make more efficient use of training data.
Complementary to the model aspect, we also study the data aspect of commonsense learning in vision and language. We study active learning for Visual Question Answering (VQA) where a model iteratively grows its knowledge through querying informative questions about images for answers. Drawing analogies from human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a new goal-driven scoring function for deep VQA models under the Bayesian Neural Network framework. Once trained with a large initial training set, a deep VQA model is able to efficiently query informative question-image pairs for answers to improve itself through active learning, saving human effort on commonsense annotations. / Ph. D. / Designing systems that learn and reason with common sense is a challenging problem in Artificial Intelligence (AI). Humans have the remarkable ability to interpret images and text from different perspectives in multiple modalities, and to use large amounts of commonsense knowledge while performing visual or textual tasks. Inspired by that ability, we approach commonsense learning as leveraging perspectives from multiple modalities for images and text in the context of vision and language tasks.
Given a target task, our system first represents the input information (e.g., images and text) in multiple modalities (e.g., vision, text, abstract scenes and facts). Those modalities provide different perspectives to interpret the input information. Based on those perspectives, the system performs reasoning to make a joint prediction to solve the target task. Perhaps surprisingly, we show that imagining (generating) abstract scenes behind input textual scene descriptions improves performance on various textual reasoning tasks such as answering fill-in-the-blank and paraphrasing questions, and answering questions about images improves performance on retrieving image captions. Through the use of perspectives from multiple modalities, our system also makes use of training data more efficiently and has a reasoning process that is easy to understand.
Complementary to the system design aspect, we also study the data aspect of commonsense learning in vision and language. We study active learning for Visual Question Answering (VQA). VQA is the task of answering open-ended natural language questions about images. In active learning for VQA, a model iteratively grows its knowledge through querying informative questions about images for answers. Inspired by human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a new goal-driven query selection function. We show that once initialized with a large training set, a VQA model is able to efficiently query informative question-image pairs for answers to improve itself through active learning, saving human effort on commonsense annotations.
|
4 |
Imagens do massacre do Realengo: a função informativa da legenda fotográfica nos jornais impressos / Images of the massacre of Realengo: the informative function of the photographic captions in printed newspapersZarattini, Mônica Rolim 23 September 2013 (has links)
O objetivo da presente pesquisa foi investigar a função informativa da legenda fotográfica no caso do Massacre do Realengo, analisando as relações entre a linguagem verbal e a visual no jornalismo contemporâneo. Foram estudados os contextos em que a fotografia jornalística transita como discurso pelos meios de comunicação, cujos alcances estão se expandindo pelas novas tecnologias. O corpus da pesquisa contou com 39 capas das edições impressas de jornais brasileiros do dia 8/4/2011, para o estudo de caso da tragédia conhecida como Massacre do Realengo. Foram utilizadas principalmente as metodologias de análise iconográfica de Kossoy e Gervereau, comparadas à análise das unidades de informação (quem, onde, quando, o que e como) contidas nas legendas fotográficas inspiradas nas metodologias de Morin e Santos. Foi aplicado também questionário sobre legenda fotográfica, respondido pelos jornalistas das redações dos jornais estudados (redatores-chefes, editores executivos, diretores de redação ou editores de fotografia). Constatou-se que mais de 50% dos jornais publicaram as fotografias do caso com suas respectivas legendas. Ao separar cada unidade de informação, verificou-se como a função informativa em cada caso promovia o diálogo entre o sentido imagético da fotografia e o sentido lógico do texto. A outra metade das fotografias publicadas formou predominantemente narrativas que facilitaram sua entrada na instância da imagem ao vivo, conceito elaborado por Bucci. Foi, enfim, possível concluir que a legenda fotográfica como unidade visual de fácil percepção do leitor cumpriu sua função informativa de dar suporte de sentido à imagem iconográfica em diálogo com outros módulos de texto do jornal, em geral, sob as seguintes tendências: 1) algumas legendas, redigidas com base nas unidades de informação, deram suporte de sentido à imagem; 2) algumas legendas continham apenas descrição do que se via na imagem e, portanto, não deram nenhum suporte de sentido à imagem; 3) e, outras, foram escritas com informações que não se relacionaram com as imagens e deram suporte de sentido à reportagem em geral e à mensagem sensacionalista que a maioria dos jornais pretendeu transmitir. / The goal of this research was to investigate the informative function of photographic captions focused on the case of the Massacre of Realengo, analyzing the relation between verbal and visual language in contemporary journalism. The essay analyses the contexts in which journalistic photography transits as a discourse through the media, which reaches have been expanded due to the new information technologies. The corpus of this research included 39 first pages of Brazilian newspapers printed editions, dated April 8th 2011, for the study of a tragedy known as the Massacre of Realengo. Kossoy and Gervereau\'s iconographic analysis methods were the most used, and they were compared to the analysis of information units (who, where, when, what and how), contained in the captions inspired by Morin and Santos´ methods. A questionnaire about photographic caption had also been applied to newsrooms\' journalists of the analysed newspapers (editor in chief, executive editors, newsroom directors or photo editors). It was found that over 50% of the newspapers published photographs of the event with their respective captions. By separating each information unit, it was verified how the informative function in each case promoted the dialogue between the imagery sense of the photograph and the logical sense of the text. The other half of the published photographs formed mostly narratives that facilitated their entry into the instance of the live image, a concept developed by Bucci. It was finally concluded that the photographic caption, as a visual unit of easy perception for the readers, fulfilled its reporting function to support the sense of iconographic image in dialogue with other modules of the newspaper text, in general, under the following trends: 1) some captions, written based on the information units, provided support for the meaning of the image; 2) some captions contained only description of what was being seen in the image, therefore, they did not give any support for the meaning of the image; 3) and others were written with information which did not relate to the images, giving sense support to the overall report and to the sensationalist message which most newspapers intended to transmit.
|
5 |
Otimização dos custos de energia elétrica na programação do armazenamento e distribuição de água em redes urbanas / Minimization of the electrical energy cost in water distribution networksSoler, Edilaine Martins 22 February 2008 (has links)
O problema abordado nesta pesquisa consiste na distribuição de água em redes urbanas para o atendimento de demandas conhecidas, com o objetivo de minimizar o custo da energia elétrica necessária para o funcionamento de bombas hidráulicas. As bombas hidráulicas são utilizadas para captar água de poços artesianos ou estações de tratamento de água para abastecer reservatários distribuídos por bairros de uma cidade, de onde a população será atendida por força gravitacional. Como o custo da energia elétrica varia ao longo do dia, se faz necessário um planejamento do funcionamento das bombas para que não sejam ligadas nos horários em que a energia elétrica é mais cara. O problema de planejamento de estoque de água em reservatórios (PPEAR) consiste em decidir em quais períodos ou frações dos períodos do horizonte de planejamento as bombas hidráulicas que abastecem os reservatórios devem permanecer ligadas e em quais períodos ou frações dos períodos deve haver transporte de água entre os reservatórios para que a demanda de cada reservatório seja atendida em cada período e sejam respeitados os níveis mínimos e máximos de água nos reservatórios. Uma solução heurística para resolver o PPEAR é proposta e analisada por comparação com as soluções obtidas pelo método de enumeração implícita. Resultados computacionais comprovam a eficiência da abordagem, tanto pela qualidade das soluções como pelo baixo tempo de resposta / The problem focused in this study consists of reducing the eletrical energy cost necessary to the operation of hydraulic pumps. The hydraulic pumps are used to catch water from artesians wells or Water Treatment Station to supply tanks which are located in districts in a city, from which the population will be supplied by gravitational force. As the cost of electrical energy varies along the day, a schedule of the pumps run is necessary to avoid that they are not turned in the periods when the energy cost is more expensive. The problem of water stock schedule in tanks (WSST) consists of deciding in which periods or parts of them of the horizon planning the hydraulic pumps have to put on, and in which periods or parts of them should transfer water among the tanks so that the demand of each tank is met for each period and lower and upper limits of water shouldn\'t be violated. A heuristic solution is proposed and analyzed by comparing its solutions with the solutions obtained by the branch and bound method. Computational experiments show the efficiency of the heuristic
|
6 |
Imagens do massacre do Realengo: a função informativa da legenda fotográfica nos jornais impressos / Images of the massacre of Realengo: the informative function of the photographic captions in printed newspapersMônica Rolim Zarattini 23 September 2013 (has links)
O objetivo da presente pesquisa foi investigar a função informativa da legenda fotográfica no caso do Massacre do Realengo, analisando as relações entre a linguagem verbal e a visual no jornalismo contemporâneo. Foram estudados os contextos em que a fotografia jornalística transita como discurso pelos meios de comunicação, cujos alcances estão se expandindo pelas novas tecnologias. O corpus da pesquisa contou com 39 capas das edições impressas de jornais brasileiros do dia 8/4/2011, para o estudo de caso da tragédia conhecida como Massacre do Realengo. Foram utilizadas principalmente as metodologias de análise iconográfica de Kossoy e Gervereau, comparadas à análise das unidades de informação (quem, onde, quando, o que e como) contidas nas legendas fotográficas inspiradas nas metodologias de Morin e Santos. Foi aplicado também questionário sobre legenda fotográfica, respondido pelos jornalistas das redações dos jornais estudados (redatores-chefes, editores executivos, diretores de redação ou editores de fotografia). Constatou-se que mais de 50% dos jornais publicaram as fotografias do caso com suas respectivas legendas. Ao separar cada unidade de informação, verificou-se como a função informativa em cada caso promovia o diálogo entre o sentido imagético da fotografia e o sentido lógico do texto. A outra metade das fotografias publicadas formou predominantemente narrativas que facilitaram sua entrada na instância da imagem ao vivo, conceito elaborado por Bucci. Foi, enfim, possível concluir que a legenda fotográfica como unidade visual de fácil percepção do leitor cumpriu sua função informativa de dar suporte de sentido à imagem iconográfica em diálogo com outros módulos de texto do jornal, em geral, sob as seguintes tendências: 1) algumas legendas, redigidas com base nas unidades de informação, deram suporte de sentido à imagem; 2) algumas legendas continham apenas descrição do que se via na imagem e, portanto, não deram nenhum suporte de sentido à imagem; 3) e, outras, foram escritas com informações que não se relacionaram com as imagens e deram suporte de sentido à reportagem em geral e à mensagem sensacionalista que a maioria dos jornais pretendeu transmitir. / The goal of this research was to investigate the informative function of photographic captions focused on the case of the Massacre of Realengo, analyzing the relation between verbal and visual language in contemporary journalism. The essay analyses the contexts in which journalistic photography transits as a discourse through the media, which reaches have been expanded due to the new information technologies. The corpus of this research included 39 first pages of Brazilian newspapers printed editions, dated April 8th 2011, for the study of a tragedy known as the Massacre of Realengo. Kossoy and Gervereau\'s iconographic analysis methods were the most used, and they were compared to the analysis of information units (who, where, when, what and how), contained in the captions inspired by Morin and Santos´ methods. A questionnaire about photographic caption had also been applied to newsrooms\' journalists of the analysed newspapers (editor in chief, executive editors, newsroom directors or photo editors). It was found that over 50% of the newspapers published photographs of the event with their respective captions. By separating each information unit, it was verified how the informative function in each case promoted the dialogue between the imagery sense of the photograph and the logical sense of the text. The other half of the published photographs formed mostly narratives that facilitated their entry into the instance of the live image, a concept developed by Bucci. It was finally concluded that the photographic caption, as a visual unit of easy perception for the readers, fulfilled its reporting function to support the sense of iconographic image in dialogue with other modules of the newspaper text, in general, under the following trends: 1) some captions, written based on the information units, provided support for the meaning of the image; 2) some captions contained only description of what was being seen in the image, therefore, they did not give any support for the meaning of the image; 3) and others were written with information which did not relate to the images, giving sense support to the overall report and to the sensationalist message which most newspapers intended to transmit.
|
7 |
Otimização dos custos de energia elétrica na programação do armazenamento e distribuição de água em redes urbanas / Minimization of the electrical energy cost in water distribution networksEdilaine Martins Soler 22 February 2008 (has links)
O problema abordado nesta pesquisa consiste na distribuição de água em redes urbanas para o atendimento de demandas conhecidas, com o objetivo de minimizar o custo da energia elétrica necessária para o funcionamento de bombas hidráulicas. As bombas hidráulicas são utilizadas para captar água de poços artesianos ou estações de tratamento de água para abastecer reservatários distribuídos por bairros de uma cidade, de onde a população será atendida por força gravitacional. Como o custo da energia elétrica varia ao longo do dia, se faz necessário um planejamento do funcionamento das bombas para que não sejam ligadas nos horários em que a energia elétrica é mais cara. O problema de planejamento de estoque de água em reservatórios (PPEAR) consiste em decidir em quais períodos ou frações dos períodos do horizonte de planejamento as bombas hidráulicas que abastecem os reservatórios devem permanecer ligadas e em quais períodos ou frações dos períodos deve haver transporte de água entre os reservatórios para que a demanda de cada reservatório seja atendida em cada período e sejam respeitados os níveis mínimos e máximos de água nos reservatórios. Uma solução heurística para resolver o PPEAR é proposta e analisada por comparação com as soluções obtidas pelo método de enumeração implícita. Resultados computacionais comprovam a eficiência da abordagem, tanto pela qualidade das soluções como pelo baixo tempo de resposta / The problem focused in this study consists of reducing the eletrical energy cost necessary to the operation of hydraulic pumps. The hydraulic pumps are used to catch water from artesians wells or Water Treatment Station to supply tanks which are located in districts in a city, from which the population will be supplied by gravitational force. As the cost of electrical energy varies along the day, a schedule of the pumps run is necessary to avoid that they are not turned in the periods when the energy cost is more expensive. The problem of water stock schedule in tanks (WSST) consists of deciding in which periods or parts of them of the horizon planning the hydraulic pumps have to put on, and in which periods or parts of them should transfer water among the tanks so that the demand of each tank is met for each period and lower and upper limits of water shouldn\'t be violated. A heuristic solution is proposed and analyzed by comparing its solutions with the solutions obtained by the branch and bound method. Computational experiments show the efficiency of the heuristic
|
8 |
memeBot: Automatic Image Meme Generation for Online Social InteractionJanuary 2020 (has links)
abstract: Internet memes have become a widespread tool used by people for interacting and exchanging ideas over social media, blogs, and open messengers. Internet memes most commonly take the form of an image which is a combination of image, text, and humor, making them a powerful tool to deliver information. Image memes are used in viral marketing and mass advertising to propagate any ideas ranging from simple commercials to those that can cause changes and development in the social structures like countering hate speech.
This work proposes to treat automatic image meme generation as a translation process, and further present an end to end neural and probabilistic approach to generate an image-based meme for any given sentence using an encoder-decoder architecture. For a given input sentence, a meme is generated by combining a meme template image and a text caption where the meme template image is selected from a set of popular candidates using a selection module and the meme caption is generated by an encoder-decoder model. An encoder is used to map the selected meme template and the input sentence into a meme embedding space and then a decoder is used to decode the meme caption from the meme embedding space. The generated natural language caption is conditioned on the input sentence and the selected meme template.
The model learns the dependencies between the meme captions and the meme template images and generates new memes using the learned dependencies. The quality of the generated captions and the generated memes is evaluated through both automated metrics and human evaluation. An experiment is designed to score how well the generated memes can represent popular tweets from Twitter conversations. Experiments on Twitter data show the efficacy of the model in generating memes capable of representing a sentence in online social interaction. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
9 |
Partial and Synchronized Caption to Foster Second Language Listening based on Automatic Speech Recognition Clues / 第二言語のリスニング訓練のための自動音声認識を用いた部分的かつ同期された字幕付与Maryam, Sadat Mirzaei 23 March 2017 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第20505号 / 情博第633号 / 新制||情||110(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 河原 達也, 教授 黒橋 禎夫, 教授 壇辻 正剛 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
10 |
Intimitet och emotionell kommunikation via Instagram : en kvalitativ studie om influencers sätt att kommuniceraKadmark, Louise January 2019 (has links)
Sociala medier har blivit en integrerad del av vår vardag och profiler arbetar ständigt med skapandet av identitet, representation och interaktion. Efter sökandet av tidigare forskning uppfattades en avsaknad för studerandet av utformning och påverkan i kommunikationen på Instagram. Främst vad gäller en intim och emotionell sådan, som sänds ut via influencers. Studien syftar till att ge förståelse av de självrepresentativa aspekterna, men också hur emotion och förmänskligandet av kanalen bidrar till den annars distanserade närkontakten med följarna. Uppsatsen ska belysa vikten av interaktion i relation mellan sändare och mottagare samt studera modaliteter som språkhandlingar och tilltal med dess eventuella betydelser för den upplevda påverkan på en tänkt publik. För att få ett rättvist kvalitativt resultat har en multimodal analys använts med inslag av den kritiska lingvistiken. Materialet har samlats in digitalt och utgörs av tre influencers två olika instagraminlägg, där utformandet av inläggen granskas utifrån ett kodschema med stödfrågor, innefattande både text- och bildelement. Även samspelet mellan bild och text studeras. Resultatet pekar på de återkommande begreppen av identitetsskapande och bekräftelse, som verkar vara avgörande för den symboliska interaktionen och bibehållandet av kanalen. Självpresentationen är vital och den intima, emotionella kommunikationen bidrar till en allt närmare, personligare kontakt med den tänkta publiken. Kommunikationen spelar en viktig roll främst vid skapandet av mening och till eftersträvat budskap kopplat till emotion.
|
Page generated in 0.0495 seconds