Return to search

Learning mid-level representations for scene understanding.

本論文包括了對場景分類框架的描述,并針對自然場景中學習中間層特徵表達的問題做了深入的探討。 / 當前的場景分類框架主要包括特徵提取,特稱編碼,空間信息整合和分類器學習幾個步驟。在這些步驟中,特徵提取是圖像理解的基礎環節。局部特徵表達被認為是計算機視覺在實際應用中成功的關鍵。但是近年來,中間層信息表達逐漸吸引了這個領域的眾多目光。本論文從兩個方面來理解中間層特徵。一個是局部底層信息的整合,另外一個是語義信息的嵌入。本文中,我們的工作同時覆蓋了“整合“和“語意“兩個方面。 / 在自然圖像的統計特徵中,我們發現圖像底層響應的相關性代表了局部結構信息。基於這個發現,我們構造了一個兩層學習模型。第一層是長得類似邊響應的底層信息,第二層是過完備的協方差特徵層,同時也是本文中提到的中間層信息。從“整合局部底層信息“的角度看,我們的方法在在這個方向上更進一步。我們將中間層特徵用到了場景分類中,并取得了良好的效果。特別是與人工設計的特徵相比,我們的特徵完全來自于自動學習。我們的協方差特徵的有效性為未來的特徵學習提供了一個新的思路:對於低層響應的相互關係的研究可以幫助構造表達能力更強的特徵。 / 爲了將語義信息加入到中間層特徵的學習中,我們定義了一個名詞叫做“信息化組分“。 所謂的信息化組分指的是那些能夠用來描述一類場景同時又能用來區分不同場景的結構化信息。基於固定秩的產生式模型的假設,我們設計了產生式模型和判別式分類器聯合學習的優化模型。通過將學習得到的信息化組分用到場景分類的實驗中,這類信息化結構的有效性得到了充分地證實。我們同時發現,如果將這一類信息化結構和底層的特徵表達聯合起來作為新的特徵表達,會使得分類的準確率得到進一步地提升。這個發現為我們未來的工作指引了方向:通過嘗試合併多層的特徵表達來提高整體的分類效果。 / This thesis contains the review of state-of-the-art scene classification frameworks and study about learning mid-level representations for scene understanding. / Current scene classification pipeline consists of feature extraction, feature encoding, spatial aggregation, and classifier learning. Among these steps, feature extraction is the most fundamental one for scene understanding. Beyond low level features, obtaining effective mid-level representations catches eyes in the scene understanding field in recent years. We interpret mid-level representations from two perspectives. One is the aggregation from low level cues and the other is embedding semantic information. In this thesis, our work harvests both properties of “aggregation“ and “semantic“. / Given the observation from natural image statistics that correlations among patch-level responses contain strong structure information, we build a two-layer model. The first layer is the patch level response with edge-let appearance, and the second layer contains sparse covariance patterns, which is considered as the mid-level representation. From the view of “aggregation from low level cues“, our work moves one step further in this direction. We use learned covariance patterns in scene classification. It shows promising performance even compared with those human-designed features. The efficiency of our covariance patterns gives a new clue for feature learning, that is, correlations among lower-layer responses can help build more powerful feature representations. / With the motivation of coupling semantic information into building the mid-level representation, we define a new “informative components“ term in this thesis. Informative components refer to those regions that are descriptive within one class and also distinctive among different classes. Based on a generative assumption that descriptive regions can fit a fixed rank model, we provide an integrated optimization framework, which combines generative modeling and discriminative learning together. Experiments on scene classification bear out the efficiency of our informative components. We also find that by simply concatenating informative components with low level responses, the classification performance can be further improved. This throws light on the future direction to improve representation power via the combination of multiple-layer representations. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Wang, Liwei. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 62-72). / Abstracts also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Scene Classification Pipeline --- p.1 / Chapter 1.2 --- Learning Mid-Level Representations --- p.6 / Chapter 1.3 --- Contributions and Organization --- p.7 / Chapter 2 --- Background --- p.9 / Chapter 2.1 --- Mid-level Representations --- p.9 / Chapter 2.1.1 --- Aggregation FromLow Level Cues --- p.10 / Chapter 2.1.2 --- Embedding Semantic Information --- p.13 / Chapter 2.2 --- Scene Data Sets Description --- p.16 / Chapter 3 --- Learning Sparse Covariance Patterns --- p.20 / Chapter 3.1 --- Introduction --- p.20 / Chapter 3.2 --- Model --- p.26 / Chapter 3.3 --- Learning and Inference --- p.28 / Chapter 3.3.1 --- Inference --- p.28 / Chapter 3.3.2 --- Learning --- p.30 / Chapter 3.4 --- Experiments --- p.31 / Chapter 3.4.1 --- Structure Mapping --- p.33 / Chapter 3.4.2 --- 15-Scene Classification --- p.34 / Chapter 3.4.3 --- Indoor Scene Recognition --- p.36 / Chapter 3.5 --- Summary --- p.38 / Chapter 4 --- Learning Informative Components --- p.39 / Chapter 4.1 --- Introduction --- p.39 / Chapter 4.2 --- RelatedWork --- p.43 / Chapter 4.3 --- OurModel --- p.45 / Chapter 4.3.1 --- Component Level Representation --- p.45 / Chapter 4.3.2 --- Fixed Rank Modeling --- p.46 / Chapter 4.3.3 --- Informative Component Learning --- p.47 / Chapter 4.4 --- Experiments --- p.52 / Chapter 4.4.1 --- Informative Components Learning --- p.54 / Chapter 4.4.2 --- Scene Classification --- p.55 / Chapter 4.5 --- Summary --- p.58 / Chapter 5 --- Conclusion --- p.60 / Bibliography --- p.62

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328775
Date January 2013
ContributorsWang, Liwei., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatelectronic resource, electronic resource, remote, 1 online resource (x, 72 leaves) : ill. (some col.)
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0025 seconds