Return to search

Deep learning for attribute inference, parsing, and recognition of face / CUHK electronic theses & dissertations collection

Deep learning has been widely and successfully applied to many difficult tasks in computer vision, such as image parsing, object detection, and object recognition, where various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have achieved impressive performance and significantly outperformed state-of-the-art methods. However, the potential of deep learning in face related problems has not be fully explored yet. In this thesis, we fully explore different deep learning methods and proposes new network architectures and learning algorithms on face related applications, such as face parsing, face attribute inference, and face recognition. / For face parsing, we propose a novel face parser, which recasts segmentation of face components as a cross-modality data transformation problem, i.e., transforming an image patch to a label map. Specifically, a face is represented hierarchically by parts, components, and pixel-wise labels. With this representation, this approach first detects faces at both the part- and component-levels, and then computes the pixel-wise label maps. The part-based and component-based detectors are generatively trained with the deep belief network (DBN), and are discriminatively tuned by logistic regression. The segmentators transform the detected face components to label maps, which are obtained by learning a highly nonlinear mapping with the deep autoencoder. The proposed hierarchical face parsing is not only robust to partial occlusions but also provide richer information for face analysis and face synthesis compared with face keypoint detection and face alignment. / For face attribute inference, the proposed approach captures the interdependencies of local regions for each attribute, as well as the high-order correlations between different attributes, which makes it more robust to occlusions and misdetection of face regions. First, we have modeled region interdependencies with a discriminative decision tree, where each node consists of a detector and a classifier trained on a local region. The detector allows us to locate the region, while the classifier determines the presence or absence of an attribute. Second, correlations of attributes and attribute predictors are modeled by organizing all of the decision trees into a large sum-product network (SPN), which is learned by the EM algorithm and yields the most probable explanation (MPE) of the facial attributes in terms of the region’s localization and classification. Experimental results on a large data set with 22,400 images show the effectiveness of the proposed approach. / For face recognition, this thesis addresses this challenge by proposing a new deep learning framework that can recover the canonical view of face images. It dramatically reduces the intra-person variances, while maintaining the inter-person discriminativeness. Unlike the existing face reconstruction methods that were either evaluated in controlled 2D environment or employed 3D information, our approach directly learns the transformation between face images with a complex set of variations and their canonical views. At the training stage, to avoid the costly process of labeling canonical-view images from the training set by hand, we have devised a new measurement and algorithm to automatically select or synthesize a canonical-view image for each identity. The recovered canonical-view face images are matched by using a facial component-based convolutional neural network. Our approach achieves the best performance on the LFW dataset under the unrestricted protocol. We also demonstrate that the performance of existing methods can be improved if they are applied to our recovered canonical-view face images. / 近年來,深度學習算法被成功應用於解決各種困難的計算機視覺問題,例如圖像分割、物體識別和檢測等。深度學習算法,如深度神經網絡、深度卷積神經網絡、和深度置信度網絡在上述方面取得重要突破,並且算法性能超過了傳統計算機視覺算法。然而,人臉圖片,作為人的視覺認知最重要的環節之一,還沒有在深度學習框架下進行研究。本文以人臉圖片分析為背景,深入探討了適用的深度學習算法與不同的深度網絡結構。主要關注以下幾個應用,包括人臉分割、人臉屬性判斷、和人臉識別。 / 對於人臉分割問題,我們把傳統的計算機視覺分割問題變成一個高維空間數據轉換問題,即把人臉圖片轉換為分割圖。一張人臉圖片可以層次化的表示為像素塊、人臉關鍵點(五官)、和人臉區域。通過使用該人臉表示,我們的方法先檢測人臉的區域,其次檢測人臉關鍵點,最後根據人臉關鍵點位置把像素塊轉換為分割圖。本文提出的方法包括兩個步驟:關鍵點檢測和圖元轉換為分割圖。本文使用深度置信度網絡進行關鍵點檢測;使用深度編碼器進行像素點到分割圖的轉換。該方法對人臉遮擋也具有魯棒性。 / 對於人臉屬性判斷,本文提出的方法對兩種相關性進行建模,包括人臉關鍵區域相關性和人臉屬性之間的相關性。我們使用決策樹對人臉關鍵區域相關性進行建模。通過把尋找與決策樹一一對應的Sum-Product樹對屬性之間的相關性進行建模。通過對22400張人臉圖片進行實驗,驗證本文提出的方法的有效性與魯棒性。 / 對於人臉識別問題,本論文提出了一種新的人臉表示方法,稱爲人臉身份保持性特徵。該特徵能夠保持不同身份人臉之間的判別性,同時減少同一身份人臉間的變化。該特徵還可以恢復輸入人臉圖片的正臉。使用該正臉圖片進行人臉歸一化,可以使現有人臉識別算法的準確率都能得到提高。 / Luo, Ping. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 83-95). / Abstracts also in Chinese. / Title from PDF title page (viewed on 27, October, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_1290660
Date January 2014
ContributorsLuo, Ping (author.), Tang, Xiaoou (thesis advisor.), Wang, Xiaogang , active 2003 (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Information Engineering. (degree granting institution.)
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography, text
Formatelectronic resource, electronic resource, remote, 1 online resource (2, xxiv, 95 leaves) : illustrations (some color), computer, online resource
RightsUse of this resource is governed by the terms and conditions of the Creative Commons "Attribution-NonCommercial-NoDerivatives 4.0 International" License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0026 seconds