在諸多電腦視覺和模式識別的問題中,採集到的圖像和視頻資料通常是高維的。直接計算這些高維資料常常面臨計算可行性和穩定性等方面的困難。然而,現實世界中的資料通常由少數物理因素產生,因而本質上存在退化的結構。例如,它們可以用子空間、子空間的集合、流形或者分層流形等模型來描述。計算並運用這些內在退化結構不僅有助於深入理解問題的本質,而且能夠幫助解決實際應用中的難題。 / 隨著近些年凸優化理論和應用的發展,一些NP難題諸如低稚矩陣的計算和稀疏表示的問題已經有了近乎完美和高效的求解方法。本論文旨在研究如何應用這些技術來計算高維資料中的退化結構,並著重研究子空間和子空間的集合這兩種結構,以及它們在現實應用方面的意義。這些應用包括:人臉圖像的配准、背景分離以及自動植物辨別。 / 在人臉圖像配准的問題中,同一人臉在不同光照下的面部圖像經過逐圖元配准後應位於一個低維的子空間中。基於此假設,我們提出了一個新的圖像配准方法,能夠對某未知人臉的多副不同光照、表情和姿態下的圖像進行聯合配准,使得每一幅面部圖像的圖元與事先訓練的一般人臉模型相匹配。其基本思想是追尋一個低維的且位於一般人臉子空間附近的仿射子空間。相比于傳統的基於外觀模型的配准方法(例如主動外觀模型)依賴于準確的外觀模型的缺點,我們提出的方法僅需要一個一般人臉模型就可以很好地對該未知人臉的多副圖像進行聯合配准,即使該人臉與訓練該模型的樣本相差很大。實驗結果表明,該方法的配准精度在某些情況下接近于理想情形,即:當該目標人臉的模型事先已知時,傳統方法所能夠達到的配准精度。 / In a wide range of computer vision and pattern recognition problems, the captured images and videos often live in high-dimensional observation spaces. Directly computing them may suffer from computational infeasibility and numerical instability. On the other hand, the data in the real world are often generated due to limited number of physical causes, and thus embed degenerate structures in the nature. For instance, they can be modeled by a low-dimensional subspace, a union of subspaces, a manifold or even a manifold stratification. Discovering and harnessing such intrinsic structures not only brings semantic insight into the problems at hand, but also provides critical information to overcome challenges encountered in the practice. / Recent years have witnessed great development in both the theory and application of convex optimization. Efficient and elegant solutions have been found for NP-hard problems such as low-rank matrix recovery and sparse representation. In this thesis, we study the problem of discovering degenerate structures of high-¬dimensional inputs using these techniques. Especially we focus ourselves on low-dimensional subspaces and their unions, and address their application in overcoming the challenges encoun-tered under three practical scenarios: face image alignment, background subtraction and automatic plant identification. / In facial image alignment, we propose a method that jointly brings multiple images of an unseen face into alignment with a pre-trained generic appearance model despite different poses, expressions and illumination conditions of the face in the images. The idea is to pursue an intrinsic affine subspace of the target face that is low-dimensional while at the same time lies close to the generic subspace. Compared with conventional appearance-based methods that rely on accurate appearance mod-els, ours works well with only a generic one and performs much better on unseen faces even if they significantly differ from those for training the generic model. The result is approximately good as that in an idealistic case where a specific model for the target face is provided. / For background subtraction, we propose a background model that captures the changes caused by the background switching among a few configurations, like traffic lights statuses. The background is modeled as a union of low-dimensional subspaces, each characterizing one configuration of the background, and the proposed algorithm automatically switches among them and identifies violating elements as foreground pixels. Moreover, we propose a robust learning approach that can work with foreground-present training samples at the background modeling stage it builds a correct background model with outlying foreground pixels automatically pruned out. This is practically important when foreground-free training samples are difficult to obtain in scenarios such as traffic monitoring. / For automatic plant identification, we propose a novel and practical method that recognizes plants based on leaf shapes extracted from photographs. Different from existing studies that are mostly focused on simple leaves, the proposed method is de-signed to recognize both simple and compound leaves. The key to that is, instead of either measuring geometric features or matching shape features as in conventional methods, we describe leaves by counting on them the numbers of certain shape patterns. The patterns are learned in a way that they form a degenerate polytope (a spe-cial union of affine subspaces) in the feature space, and can simulate, to some extent, the "keys" used by botanists - each pattern reflects a common feature of several dif-ferent species and all the patterns together can form a discriminative rule for recog-nition. Experiments conducted on a variety of datasets show that our algorithm sig-nificantly outperforms the state-of-art methods in terms of recognition accuracy, ef-ficiency and storage, and thus has a good promise for practicing. / In conclusion, our performed studies show that: 1) the visual data with semantic meanings are often not random - although they can be high-dimensional, they typically embed degenerate structures in the observation space. 2) With appropriate assumptions made and clever computational tools developed, these structures can be efficiently and stably calculated. 3) The employment of these intrinsic structures helps overcoming practical challenges and is critical for computer vision and pattern recognition algorithms to achieve good performance. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / 在背景分離的問題中,靜態場景在不同光照情形下的背景可以被描述為一個線性子空間。然而在實際應用中,背景的局部和突然的變化有可能違背此假設,尤其是當背景在幾個狀態之間切換的情形下,例如交通燈在不同組合狀態之間切換。為了解決該問題,本論文中提出了一個新的背景模型,它將背景描述為一些子空間的集合,每個子空間對應一個背景狀態。我們將背景分離的問題轉化為稀疏逼近的問題,因此演算法能夠自動在多個狀態中切換並成功檢測出前景物體。此外,本論文提出了一個魯棒的字典學習方法。在訓練背景模型的過程中,它能夠處理含有前景物體的圖像,並在訓練過程中自動將前景部分去掉。這個優點在難以收集完整背景訓練樣本的應用情形(譬如交通監視等)下有明顯的優勢。 / 在植物種類自動辨別的問題中,本論文中提出了一個新的有效方法,它通過提取和對比植物葉片的輪廓對植物進行識別和分類。不同于傳統的基於測量幾何特徵或者在形狀特徵之間配對的方法,我們提出使用葉子上某些外形模式的數量來表達樹葉。這些模式在特徵空間中形成一個退化的多面體結構(一種特殊的仿射空間的集合),而且在某種程度上能夠類比植物學中使用的分類檢索表每個模式都反映了一些不同植物的某個共性,例如某種邊緣、某種形狀、某種子葉的佈局等等;而所有模式組合在一起能夠形成具有很高區分度的分類準則。通過對演算法在四個數據庫上的測試,我們發現本論文提出的方法無論在識別精度還是在效率和存儲方面都相比于目前主流方法有顯著提高,因此具有很好的應用性。 / 總之,我們進行的一些列研究說明:(1) 有意義的視覺資料通常是內在相關的,儘管它們的維度可能很高,但是它們通常都具有某種退化的結構。(2) 合理的假設和運用計算工具可以高效、穩健地發現這些結構。(3) 利用這些結構有助於解決實際應用中的難題,且能夠使得電腦視覺和模式識別演算法達到好的性能。 / Zhao, Cong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 107-121). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Dedication --- p.i / Acknowledgements --- p.ii / Abstract --- p.v / Abstract (in Chinese) --- p.viii / Publication List --- p.xi / Nomenclature --- p.xii / Contents --- p.xiv / List of Figures --- p.xviii / Chapter Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Background --- p.2 / Chapter 1.2.1 --- Subspaces --- p.3 / Chapter 1.2.2 --- Unions of Subspaces --- p.6 / Chapter 1.2.3 --- Manifolds and Stratifications --- p.8 / Chapter 1.3 --- Thesis Outline --- p.10 / Chapter Chapter 2 --- Joint Face Image Alignment --- p.13 / Chapter 2.1 --- Introduction --- p.14 / Chapter 2.2 --- Related Works --- p.16 / Chapter 2.3 --- Background --- p.18 / Chapter 2.3.1 --- Active Appearance Model --- p.18 / Chapter 2.3.2 --- Multi-Image Alignment using AAM --- p.20 / Chapter 2.3.3 --- Limitations in Practice --- p.21 / Chapter 2.4 --- The Proposed Method --- p.23 / Chapter 2.4.1 --- Two Important Assumptions --- p.23 / Chapter 2.4.2 --- The Subspace Pursuit Problem --- p.27 / Chapter 2.4.3 --- Reformulation --- p.27 / Chapter 2.4.4 --- Efficient Solution --- p.30 / Chapter 2.4.5 --- Discussions --- p.32 / Chapter 2.5 --- Experiments --- p.34 / Chapter 2.5.1 --- Settings --- p.34 / Chapter 2.5.2 --- Results and Discussions --- p.36 / Chapter 2.6 --- Summary --- p.38 / Chapter Chapter 3 --- Background Subtraction --- p.40 / Chapter 3.1 --- Introduction --- p.41 / Chapter 3.2 --- Related Works --- p.43 / Chapter 3.3 --- The Proposed Method --- p.48 / Chapter 3.3.1 --- Background Modeling --- p.48 / Chapter 3.3.2 --- Background Subtraction --- p.49 / Chapter 3.3.3 --- Foreground Object Detection --- p.52 / Chapter 3.3.4 --- Background Modeling by Dictionary Learning --- p.53 / Chapter 3.4 --- Robust Dictionary Learning --- p.54 / Chapter 3.4.1 --- Robust Sparse Coding --- p.56 / Chapter 3.4.2 --- Robust Dictionary Update --- p.57 / Chapter 3.5 --- Experimentation --- p.59 / Chapter 3.5.1 --- Local and Sudden Changes --- p.59 / Chapter 3.5.2 --- Non-structured High-frequency Changes --- p.62 / Chapter 3.5.3 --- Discussions --- p.65 / Chapter 3.6 --- Summary --- p.66 / Chapter Chapter 4 --- Plant Identification using Leaves --- p.67 / Chapter 4.1 --- Introduction --- p.68 / Chapter 4.2 --- Related Works --- p.70 / Chapter 4.3 --- Review of IDSC Feature --- p.71 / Chapter 4.4 --- The Proposed Method --- p.73 / Chapter 4.4.1 --- Independent-IDSC Feature --- p.75 / Chapter 4.4.2 --- Common Shape Patterns --- p.77 / Chapter 4.4.3 --- Leaf Representation by Counts --- p.80 / Chapter 4.4.4 --- Leaf Recognition by NN Classifier --- p.82 / Chapter 4.5 --- Experiments --- p.82 / Chapter 4.5.1 --- Settings --- p.82 / Chapter 4.5.2 --- Performance --- p.83 / Chapter 4.5.3 --- Shared Dictionaries v.s. Shared Features --- p.88 / Chapter 4.5.4 --- Pooling --- p.89 / Chapter 4.6 --- Discussions --- p.90 / Chapter 4.6.1 --- Time Complexity --- p.90 / Chapter 4.6.2 --- Space Complexity --- p.91 / Chapter 4.6.3 --- System Description --- p.92 / Chapter 4.7 --- Summary --- p.92 / Chapter 4.8 --- Acknowledgement --- p.94 / Chapter Chapter 5 --- Conclusion and Future Work --- p.95 / Chapter 5.1 --- Thesis Contributions --- p.95 / Chapter 5.2 --- Future Work --- p.97 / Chapter 5.2.1 --- Theory Side --- p.98 / Chapter 5.2.2 --- Practice Side --- p.98 / Chapter Appendix-I --- Joint Face Alignment Results --- p.100 / Bibliography --- p.107
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328205 |
Date | January 2012 |
Contributors | Zhao, Cong, Chinese University of Hong Kong Graduate School. Division of Electronic Engineering. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography |
Format | electronic resource, electronic resource, remote, 1 online resource (xx, 121 leaves) : ill. (chiefly col.) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0034 seconds