• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 78
  • 13
  • 8
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 119
  • 39
  • 29
  • 26
  • 26
  • 25
  • 24
  • 21
  • 20
  • 18
  • 14
  • 14
  • 14
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Visual Saliency Analysis on Fashion Images Using Image Processing and Deep Learning Approaches

Neupane, Aashish 01 December 2020 (has links)
ABSTRACTAASHISH NEUPANE, for the Master of Science degree in BIOMEDICAL ENGINEERING, presented on July 35, 2020, at Southern Illinois University Carbondale. TITLE: VISUAL SALIENCY ANALYSIS ON FASHION IMAGES USING IMAGE PROCESSING AND DEEP LEARNING APPROACHES.MAJOR PROFESSOR: Dr. Jun QinState-of-art computer vision technologies have been applied in fashion in multiple ways, and saliency modeling is one of those applications. In computer vision, a saliency map is a 2D topological map which indicates the probabilistic distribution of visual attention priorities. This study is focusing on analysis of the visual saliency on fashion images using multiple saliency models, evaluated by several evaluation metrics. A human subject study has been conducted to collect people’s visual attention on 75 fashion images. Binary ground-truth fixation maps for these images have been created based on the experimentally collected visual attention data using Gaussian blurring function. Saliency maps for these 75 fashion images were generated using multiple conventional saliency models as well as deep feature-based state-of-art models. DeepFeat has been studied extensively, with 44 sets of saliency maps, exploiting the features extracted from GoogLeNet and ResNet50. Seven other saliency models have also been utilized to predict saliency maps on these images. The results were compared over 5 evaluation metrics – AUC, CC, KL Divergence, NSS and SIM. The performance of all 8 saliency models on prediction of visual attention on fashion images over all five metrics were comparable to the benchmarked scores. Furthermore, the models perform well consistently over multiple evaluation metrics, thus indicating that saliency models could in fact be applied to effectively predict salient regions in random fashion advertisement images.
2

A computational model of visual attention

Chilukamari, Jayachandra January 2017 (has links)
Visual attention is a process by which the Human Visual System (HVS) selects most important information from a scene. Visual attention models are computational or mathematical models developed to predict this information. The performance of the state-of-the-art visual attention models is limited in terms of prediction accuracy and computational complexity. In spite of significant amount of active research in this area, modelling visual attention is still an open research challenge. This thesis proposes a novel computational model of visual attention that achieves higher prediction accuracy with low computational complexity. A new bottom-up visual attention model based on in-focus regions is proposed. To develop the model, an image dataset is created by capturing images with in-focus and out-of-focus regions. The Discrete Cosine Transform (DCT) spectrum of these images is investigated qualitatively and quantitatively to discover the key frequency coefficients that correspond to the in-focus regions. The model detects these key coefficients by formulating a novel relation between the in-focus and out-of-focus regions in the frequency domain. These frequency coefficients are used to detect the salient in-focus regions. The simulation results show that this attention model achieves good prediction accuracy with low complexity. The prediction accuracy of the proposed in-focus visual attention model is further improved by incorporating sensitivity of the HVS towards the image centre and the human faces. Moreover, the computational complexity is further reduced by using Integer Cosine Transform (ICT). The model is parameter tuned using the hill climbing approach to optimise the accuracy. The performance has been analysed qualitatively and quantitatively using two large image datasets with eye tracking fixation ground truth. The results show that the model achieves higher prediction accuracy with a lower computational complexity compared to the state-of-the-art visual attention models. The proposed model is useful in predicting human fixations in computationally constrained environments. Mainly it is useful in applications such as perceptual video coding, image quality assessment, object recognition and image segmentation.
3

Visual Saliency Application in Object Detection for Search Space Reduction

January 2017 (has links)
abstract: Vision is the ability to see and interpret any visual stimulus. It is one of the most fundamental and complex tasks the brain performs. Its complexity can be understood from the fact that close to 50% of the human brain is dedicated to vision. The brain receives an overwhelming amount of sensory information from the retina – estimated at up to 100 Mbps per optic nerve. Parallel processing of the entire visual field in real time is likely impossible for even the most sophisticated brains due to the high computational complexity of the task [1]. Yet, organisms can efficiently process this information to parse complex scenes in real time. This amazing feat of nature relies on selective attention which allows the brain to filter sensory information to select only a small subset of it for further processing. Today, Computer Vision has become ubiquitous in our society with several in image understanding, medicine, drones, self-driving cars and many more. With the advent of GPUs and the availability of huge datasets like ImageNet, Convolutional Neural Networks (CNNs) have come to play a very important role in solving computer vision tasks, e.g object detection. However, the size of the networks become prohibitive when higher accuracies are needed, which in turn demands more hardware. This hinders the application of CNNs to mobile platforms and stops them from hitting the real-time mark. The computational efficiency of a computer vision task, like object detection, can be enhanced by adopting a selective attention mechanism into the algorithm. In this work, this idea is explored by using Visual Proto Object Saliency algorithm [1] to crop out the areas of an image without relevant objects before a computationally intensive network like the Faster R-CNN [2] processes it. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2017
4

Nonparametric Neighbourhood Based Multiscale Model for Image Analysis and Understanding

Jain, Aanchal 24 August 2012 (has links)
Image processing applications such as image denoising, image segmentation, object detection, object recognition and texture synthesis often require a multi-scale analysis of images. This is useful because different features in the image become prominent at different scales. Traditional imaging models, which have been used for multi-scale analysis of images, have several limitations such as high sensitivity to noise and structural degradation observed at higher scales. Parametric models make certain assumptions about the image structure which may or may not be valid in several situations. Non-parametric methods, on the other hand, are very flexible and adapt to the underlying image structure more easily. It is highly desirable to have effi cient non-parametric models for image analysis, which can be used to build robust image processing algorithms with little or no prior knowledge of the underlying image content. In this thesis, we propose a non-parametric pixel neighbourhood based framework for multi-scale image analysis and apply the model to build image denoising and saliency detection algorithms for the purpose of illustration. It has been shown that the algorithms based on this framework give competitive results without using any prior information about the image statistics.
5

Nonparametric Neighbourhood Based Multiscale Model for Image Analysis and Understanding

Jain, Aanchal 24 August 2012 (has links)
Image processing applications such as image denoising, image segmentation, object detection, object recognition and texture synthesis often require a multi-scale analysis of images. This is useful because different features in the image become prominent at different scales. Traditional imaging models, which have been used for multi-scale analysis of images, have several limitations such as high sensitivity to noise and structural degradation observed at higher scales. Parametric models make certain assumptions about the image structure which may or may not be valid in several situations. Non-parametric methods, on the other hand, are very flexible and adapt to the underlying image structure more easily. It is highly desirable to have effi cient non-parametric models for image analysis, which can be used to build robust image processing algorithms with little or no prior knowledge of the underlying image content. In this thesis, we propose a non-parametric pixel neighbourhood based framework for multi-scale image analysis and apply the model to build image denoising and saliency detection algorithms for the purpose of illustration. It has been shown that the algorithms based on this framework give competitive results without using any prior information about the image statistics.
6

Subjective and Objective Evaluation of Visual Attention Models

January 2016 (has links)
abstract: Visual attention (VA) is the study of mechanisms that allow the human visual system (HVS) to selectively process relevant visual information. This work focuses on the subjective and objective evaluation of computational VA models for the distortion-free case as well as in the presence of image distortions. Existing VA models are traditionally evaluated by using VA metrics that quantify the match between predicted saliency and fixation data obtained from eye-tracking experiments on human observers. Though there is a considerable number of objective VA metrics, there exists no study that validates that these metrics are adequate for the evaluation of VA models. This work constructs a VA Quality (VAQ) Database by subjectively assessing the prediction performance of VA models on distortion-free images. Additionally, shortcomings in existing metrics are discussed through illustrative examples and a new metric that uses local weights based on fixation density and that overcomes these flaws, is proposed. The proposed VA metric outperforms all other popular existing metrics in terms of the correlation with subjective ratings. In practice, the image quality is affected by a host of factors at several stages of the image processing pipeline such as acquisition, compression, and transmission. However, none of the existing studies have discussed the subjective and objective evaluation of visual saliency models in the presence of distortion. In this work, a Distortion-based Visual Attention Quality (DVAQ) subjective database is constructed to evaluate the quality of VA maps for images in the presence of distortions. For creating this database, saliency maps obtained from images subjected to various types of distortions, including blur, noise and compression, and varying levels of distortion severity are rated by human observers in terms of their visual resemblance to corresponding ground-truth fixation density maps. The performance of traditionally used as well as recently proposed VA metrics are evaluated by correlating their scores with the human subjective ratings. In addition, an objective evaluation of 20 state-of-the-art VA models is performed using the top-performing VA metrics together with a study of how the VA models’ prediction performance changes with different types and levels of distortions. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2016
7

Visual saliency computation for image analysis

Zhang, Jianming 08 December 2016 (has links)
Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training.
8

Self-calibrating eye tracker using imagesaliency : Självkalibrerande ögonspårare medhjälp av image saliency / Självkalibrerande ögonspårare medhjälp av image saliency : Self-calibrating eye tracker using imagesaliency

Vega, Gabriel January 2022 (has links)
Self-calibrating eye tracker using imagesaliency / Självkalibrerande ögonspårare medhjälp av image saliency
9

User-centred video abstraction

Darabi, Kaveh January 2015 (has links)
The rapid growth of digital video content in recent years has imposed the need for the development of technologies with the capability to produce condensed but semantically rich versions of the input video stream in an effective manner. Consequently, the topic of Video Summarisation is becoming increasingly popular in multimedia community and numerous video abstraction approaches have been proposed accordingly. These recommended techniques can be divided into two major categories of automatic and semi-automatic in accordance with the required level of human intervention in summarisation process. The fully-automated methods mainly adopt the low-level visual, aural and textual features alongside the mathematical and statistical algorithms in furtherance to extract the most significant segments of original video. However, the effectiveness of this type of techniques is restricted by a number of factors such as domain-dependency, computational expenses and the inability to understand the semantics of videos from low-level features. The second category of techniques however, attempts to alleviate the quality of summaries by involving humans in the abstraction process to bridge the semantic gap. Nonetheless, a single user’s subjectivity and other external contributing factors such as distraction will potentially deteriorate the performance of this group of approaches. Accordingly, in this thesis we have focused on the development of three user-centred effective video summarisation techniques that could be applied to different video categories and generate satisfactory results. According to our first proposed approach, a novel mechanism for a user-centred video summarisation has been presented for the scenarios in which multiple actors are employed in the video summarisation process in order to minimise the negative effects of sole user adoption. Based on our recommended algorithm, the video frames were initially scored by a group of video annotators ‘on the fly’. This was followed by averaging these assigned scores in order to generate a singular saliency score for each video frame and, finally, the highest scored video frames alongside the corresponding audio and textual contents were extracted to be included into the final summary. The effectiveness of our approach has been assessed by comparing the video summaries generated based on our approach against the results obtained from three existing automatic summarisation tools that adopt different modalities for abstraction purposes. The experimental results indicated that our proposed method is capable of delivering remarkable outcomes in terms of Overall Satisfaction and Precision with an acceptable Recall rate, indicating the usefulness of involving user input in the video summarisation process. In an attempt to provide a better user experience, we have proposed our personalised video summarisation method with an ability to customise the generated summaries in accordance with the viewers’ preferences. Accordingly, the end-user’s priority levels towards different video scenes were captured and utilised for updating the average scores previously assigned by the video annotators. Finally, our earlier proposed summarisation method was adopted to extract the most significant audio-visual content of the video. Experimental results indicated the capability of this approach to deliver superior outcomes compared with our previously proposed method and the three other automatic summarisation tools. Finally, we have attempted to reduce the required level of audience involvement for personalisation purposes by proposing a new method for producing personalised video summaries. Accordingly, SIFT visual features were adopted to identify the video scenes’ semantic categories. Fusing this retrieved data with pre-built users’ profiles, personalised video abstracts can be created. Experimental results showed the effectiveness of this method in delivering superior outcomes comparing to our previously recommended algorithm and the three other automatic summarisation techniques.
10

Obstacle detection for image-guided surface water navigation

Sadhu, Tanmana 09 September 2016 (has links)
An issue of concern for maritime safety when operating a small to medium-sized sailboat is that the presence of hazards in the navigational route in the form of floating logs can lead to a severe collision if undetected. As a precautionary measure to prevent such a collision with a log, a 2D vision-based detection system is proposed. We take a combined approach involving predictive mapping by linear regression and saliency detection. This approach is found to overcome specific issues related to the illumination changes and unstructured environment in the dataset. The proposed method has been evaluated using precision and recall measures. This proof of concept demonstrates the potential of the method for deployment on a real-time onboard detection system. The algorithm is robust and of reasonable computational complexity. / Graduate

Page generated in 0.0483 seconds