• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Large-scale Affective Computing for Visual Multimedia

Jou, Brendan Wesley January 2016 (has links)
In recent years, Affective Computing has arisen as a prolific interdisciplinary field for engineering systems that integrate human affections. While human-computer relationships have long revolved around cognitive interactions, it is becoming increasingly important to account for human affect, or feelings or emotions, to avert user experience frustration, provide disability services, predict virality of social media content, etc. In this thesis, we specifically focus on Affective Computing as it applies to large-scale visual multimedia, and in particular, still images, animated image sequences and video streams, above and beyond the traditional approaches of face expression and gesture recognition. By taking a principled psychology-grounded approach, we seek to paint a more holistic and colorful view of computational affect in the context of visual multimedia. For example, should emotions like 'surprise' and `fear' be assumed to be orthogonal output dimensions? Or does a 'positive' image in one culture's view elicit the same feelings of positivity in another culture? We study affect frameworks and ontologies to define, organize and develop machine learning models with such questions in mind to automatically detect affective visual concepts. In the push for what we call "Big Affective Computing," we focus on two dimensions of scale for affect -- scaling up and scaling out -- which we propose are both imperative if we are to scale the Affective Computing problem successfully. Intuitively, simply increasing the number of data points corresponds to "scaling up". However, less intuitive, is when problems like Affective Computing "scale out," or diversify. We show that this latter dimension of introducing data variety, alongside the former of introducing data volume, can yield particular insights since human affections naturally depart from traditional Machine Learning and Computer Vision problems where there is an objectively truthful target. While no one might debate a picture of a 'dog' should be tagged as a 'dog,' but not all may agree that it looks 'ugly'. We present extensive discussions on why scaling out is critical and how it can be accomplished while in the context of large-volume visual data. At a high-level, the main contributions of this thesis include: Multiplicity of Affect Oracles: Prior to the work in this thesis, little consideration has been paid to the affective label generating mechanism when learning functional mappings between inputs and labels. Throughout this thesis but first in Chapter 2, starting in Section 2.1.2, we make a case for a conceptual partitioning of the affect oracle governing the label generation process in Affective Computing problems resulting a multiplicity of oracles, whereas prior works assumed there was a single universal oracle. In Chapter 3, the differences between intended versus expressed versus induced versus perceived emotion are discussed, where we argue that perceived emotion is particularly well-suited for scaling up because it reduces the label variance due to its more objective nature compared to other affect states. And in Chapter 4 and 5, a division of the affect oracle along cultural lines with manifestations along both language and geography is explored. We accomplish all this without sacrificing the 'scale up' dimension, and tackle significantly larger volume problems than prior comparable visual affective computing research. Content-driven Visual Affect Detection: Traditionally, in most Affective Computing work, prediction tasks use psycho-physiological signals from subjects viewing the stimuli of interest, e.g., a video advertisement, as the system inputs. In essence, this means that the machine learns to label a proxy signal rather than the stimuli itself. In this thesis, with the rise of strong Computer Vision and Multimedia techniques, we focus on the learning to label the stimuli directly without a human subject provided biometric proxy signal (except in the unique circumstances of Chapter 7). This shift toward learning from the stimuli directly is important because it allows us to scale up with much greater ease given that biometric measurement acquisition is both low-throughput and somewhat invasive while stimuli are often readily available. In addition, moving toward learning directly from the stimuli will allow researchers to precisely determine which low-level features in the stimuli are actually coupled with affect states, e.g., which set of frames caused viewer discomfort rather a broad sense that a video was discomforting. In Part I of this thesis, we illustrate an emotion prediction task with a psychology-grounded affect representation. In particular, in Chapter 3, we develop a prediction task over semantic emotional classes, e.g., 'sad,' 'happy' and 'angry,' using animated image sequences given annotations from over 2.5 million users. Subsequently, in Part II, we develop visual sentiment and adjective-based semantics models from million-scale digital imagery mined from a social multimedia platform. Mid-level Representations for Visual Affect: While discrete semantic emotions and sentiment are classical representations of affect with decades of psychology grounding, the interdisciplinary nature of Affective Computing, now only about two decades old, allows for new avenues of representation. Mid-level representations have been proposed in numerous Computer Vision and Multimedia problems as an intermediary, and often more computable, step toward bridging the semantic gap between low-level system inputs and high-level label semantic abstractions. In Part II, inspired by this work, we adapt it for vision-based Affective Computing and adopt a semantic construct called adjective-noun pairs. Specifically, in Chapter 4, we explore the use of such adjective-noun pairs in the context of a social multimedia platform and develop a multilingual visual sentiment ontology with over 15,000 affective mid-level visual concepts across 12 languages associated with over 7.3 million images and representations from over 235 countries, resulting in the largest affective digital image corpus in both depth and breadth to date. In Chapter 5, we develop computational methods to predict such adjective-noun pairs and also explore their usefulness in traditional sentiment analysis but with a previously unexplored cross-lingual perspective. And in Chapter 6, we propose a new learning setting called 'cross-residual learning' building off recent successes in deep neural networks, and specifically, in residual learning; we show that cross-residual learning can be used effectively to jointly learn across even multiple related tasks in object detection (noun), more traditional affect modeling (adjectives), and affective mid-level representations (adjective-noun pairs), giving us a framework for better grounding the adjective-noun pair bridge in both vision and affect simultaneously.
2

Avaliação do instrumento SAM para a etiquetagem de atributos afetivos de imagens em ambiente web

Oliveira, Wellton Costa de 29 August 2011 (has links)
CAPES / Em uma recuperação de imagens a partir de critérios semânticos a especificação da busca pode ocorrer em diferentes níveis. A especificação “encontre imagens contendo um barco”, por exemplo, pode ser considerada simples e objetiva se comparada com “encontre imagens ilustrando um clima alegre”, nitidamente mais subjetiva e abstrata e difícil de ser capturada a partir dos atributos de baixo nível da imagem. Em geral, este nível abstrato refere-se ao conteúdo afetivo ou emocional e mostra-se bastante relevante no contexto da recuperação de imagens. Um método amplamente aceito na psicologia para a captura dos atributos afetivos de imagens é o baseado na escala diferencial de valência (agradável-desagradável) e alerta (agitado-calmo), juntamente com o instrumento cognitivo ou experiencial subjetivo Self Assessment Manikin (SAM). Neste trabalho, apresenta-se uma avaliação do instrumento SAM para a etiquetagem de imagens segundo critérios afetivos em ambiente web. Para isso, foi desenvolvida uma ferramenta denominada Get Your Mood (GYM) (http://getyourmood.com) que utiliza a metáfora de um teste de humor para estimular a curiosidade do usuário e obter um grande número de voluntários para as marcações de valência e alerta. Foram coletadas 80 marcações para cada uma das 40 imagens pré-selecionadas da biblioteca International Affective Picture System (IAPS). As análises estatísticas demonstraram que não há diferença significativa entre os valores médios de valência e alerta do IAPS e aqueles obtidos a partir do GYM. Isto indica que, embora utilizado em um meio e em condições diferentes daquelas estabelecidas no protocolo de uso original, o instrumento SAM mantem-se válido. Assim, pode-se concluir que as dimensões valência e alerta, em conjunto com o instrumento SAM, podem ser explorados como um recurso eficiente para a etiquetagem de imagens segundo critérios afetivos em ambiente web. Além disso, foi desenvolvido um banco com 104 imagens com 50 marcações de valência e alerta para cada uma. Ao contrário do IAPS, o banco desenvolvido, denominado Open Affective Images (OPAFI), pode ter suas imagens divulgadas publicamente, o que constitui um recurso interessante para a expansão de pesquisas nessa área. / In an image retrieval system, concerning to the images semantics, several levels of query can emerge, from simple statements as “find pictures with a boat” to more abstract ones, such as “find pictures depicting happy atmospheres”. Usually, the abstract level refers to the affective or emotional content of the images and is considered a relevant dimension in which users specify their queries. Clearly, due to its inherent complexity, affect is difficult to model and consequently quite difficult to be handled by a retrieval system. A broadly accepted psychological method to capture the images affective attributes is the one based on the differential scales of valence (pleased- unpleased) and arousal (calm-excited), associated with the Self Assessment Manikin (SAM) instrument. In this work, we present an evaluation of the SAM instrument in the context of the image tagging, operating in a web environment. To perform these studies and capture as many as possible users’ tags, we developed a live web application named Get Your Mood (GYM) (http://getyourmood.com). The GYM employs the metaphor of a mood test in order to stimulate and attract the users to execute the valence/arousal tagging. Approximately 80 tags were collected for each of the 40 selected images from the image database International Affective Picture System (IAPS). The statistical analysis indicated that there is no significant difference between the IAPS valence/arousal mean values and the valence/arousal mean values obtained with the GYM. This fact provides evidences that the SAM instrument remains effective, in spite of the application under distinct conditions if compared to those claimed in its traditional protocol. Thus, it is possible to conclude that the valence/arousal dimensions, associated with the SAM instrument, can be explored as an efficient approach to the affective tagging of images in the web. In addition, we developed a database with 104 images and 50 valence/arousal tags were collect for each one. In opposite to the IAPS, the images of this novel database, named Open Affective Images (OPAFI), can be easily obtained and published without restrictions. Hence, the OPAFI database and the associated affective scores are interesting resources to other researchers, both from the computer vision or psychology communities.

Page generated in 0.1148 seconds