Using semantic sub-scenes to facilitate scene categorization and understanding

This thesis proposes to learn the absent cognitive element in conventional scene categorization methods: sub-scenes, and use them to better categorize and understand scenes. In scene categorization, it has been observed that the problem of ambiguity occurs when treating the scene as a whole. Scene ambiguity arises from when a similar set of sub-scenes are arranged differently to compose different scenes, or when a scene literally contains several categories. However, these ambiguities can be discerned by the knowledge of sub-scenes. Thus, it is worthy to study sub-scenes and use them to better understand a scene.

The proposed research firstly considers an unsupervised method to segment sub-scenes. It emphasizes on generating more integral regions instead of over-segmented regions usually produced by conventional segmentation methods. Several properties of sub-scenes are explored such as proximity grouping, area of influence, similarity and harmony based on psychological principles. These properties are formulated into constraints that are used directly in the proposed framework. A self-determined approach is employed to produce a final segmentation result based on the characteristics of each image in an unsupervised manner. The proposed method performs competitively against other state-of-the-art unsupervised segmentation methods with F-measure of 0.55, Covering of 0.51 and VoI of 1.93 in the Berkeley segmentation dataset. In the Stanford background dataset, it achieves the overlapping score of 0.566 which is higher than the score of 0.499 of the comparison method.

To segment and label sub-scenes simultaneously, a supervised approach of semantic segmentation is proposed. It is developed based on a Hierarchical Conditional Random Field classification framework. The proposed method integrates contextual information into the model to improve classification performance. Contextual information including global consistency and spatial context are considered in the proposed method. Global consistency is developed based on generalizing the scene by scene types and spatial context takes the spatial relationship into account. The proposed method improves semantic segmentation by boosting more logical class combinations. It achieves the best score in the MSRC-21 dataset with global accuracy at 87% and the average accuracy at 81%, which out-performs all other state-of-the-art methods by 4% individually. In the Stanford background dataset, it achieves global accuracy at 80.5% and average accuracy at 71.8%, also out-performs other methods by 2%.

Finally, the proposed research incorporates sub-scenes into the scene categorization framework to improve categorization performance, especially in ambiguity cases. The proposed method encodes the sub-scene in the way that their spatial information is also considered. Sub-scene descriptor compensates the global descriptor of a scene by evaluating local features with specific geometric attributes. The proposed method obtains an average categorization accuracy of 92.26% in the 8 Scene Category dataset, which outperforms all other published methods by over 2% of improvement. It evaluates ambiguity cases more accurately by discerning which part exemplifies a scene category and how those categories are organized. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy

Identiferoai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/206459
Date January 2014
CreatorsZhu, Shanshan, 朱珊珊
ContributorsYung, NHC
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
LanguageEnglish
Detected LanguageEnglish
TypePG_Thesis
RightsCreative Commons: Attribution 3.0 Hong Kong License, The author retains all proprietary rights, (such as patent rights) and the right to use in future works.
RelationHKU Theses Online (HKUTO)

Page generated in 0.0117 seconds