The qualitative assessment of image content and aesthetic impression is affected by various image attributes and relations between the attributes. Modelling of such assessments in the form of objective rankings and learning image representations based on them is not a straightforward problem. The criteria can be varied with different levels of complexity for various applications. A highly-complex problem could involve a large number of interrelated attributes and features alongside varied rules. An example of such an application is fashion-interpretation. In this case one can use attribute recognition to label different parts such as clothing and body shape automatically. Thus, the presence or absence of objects in the image is not ambiguous and a similarity measure can be established between images. It is however not clear how to establish such measure between the aesthetic impressions the images make. As a first contribution an approach for ranking images by pooling from the knowledge and experience of crowdsourced annotators is presented. Specifically, the highly subjective and complex problem of fashion interpretation and assessment of aesthetic qualities of images is addressed. To utilize the visual judgements, a novel dataset complete with labellings of various attributes of clothing and body shapes is introduced. Large scale pairwise comparisons of the order of tens of thousands are performed by annotators who follow fashion. Various consistency measures are then applied to verify agreement and correlation between the annotators to rule out inconsistencies amongst them. Based on the annotations, reliable rankings to automatically compare images according to fashion rules are established. Then, Bag of Visual Words object recognition is used to perform classification of the attributes. By incorporating annotator rankings from the first stage and these classification estimates in a lookup model for automatic assessment of images, pairwise comparisons can be automatically performed. Each visual attribute of clothing and body shape is represented within the rankings. Next, rankings obtained from the crowdsourcing procedure are included within several matching approaches to achieve a matching-based ranking. Nearest neighbour matches can be found for a pair of test images which can be compared with their rankings from the annotators. This can be utilized to establish which configuration is ranked better in an image pair. In particular, two prominent approaches of Bag of Visual Words and Local Descriptor Matching are employed to facilitate an evaluation. Several random splits of the dataset proposed in the first stage are used to form the training and test sets. Matches obtained are incorporated within an approach introduced to generate a global ranking. Evaluation from this stage is used as a comparative basis for the approach proposed next in which a learning procedure based on graphical modelling captures the annotator rankings. Finally, a novel approach for learning image representation based on qualitative assessments of visual aesthetics is proposed. It relies on a multi node multi-state model that represents image attributes and their relations. The model is learnt from pairwise image preferences provided by annotators. To demonstrate the effectiveness the approach is applied to fashion image rating, i.e., comparative assessment of aesthetic qualities. The attributes and their relations are assigned learnt potentials which are used to rate the images. Evaluation of the representation model has demonstrated a high performance rate in ranking fashion images.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:647907 |
Date | January 2015 |
Creators | Gaur, Aarushi |
Contributors | Mikolajczyk, Krystian |
Publisher | University of Surrey |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://epubs.surrey.ac.uk/807456/ |
Page generated in 0.002 seconds