Return to search

Discriminant feature pursuit: from statistical learning to informative learning.

Lin Dahua. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2006. / Includes bibliographical references (leaves 233-250). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- The Problem We are Facing --- p.1 / Chapter 1.2 --- Generative vs. Discriminative Models --- p.2 / Chapter 1.3 --- Statistical Feature Extraction: Success and Challenge --- p.3 / Chapter 1.4 --- Overview of Our Works --- p.5 / Chapter 1.4.1 --- New Linear Discriminant Methods: Generalized LDA Formulation and Performance-Driven Sub space Learning --- p.5 / Chapter 1.4.2 --- Coupled Learning Models: Coupled Space Learning and Inter Modality Recognition --- p.6 / Chapter 1.4.3 --- Informative Learning Approaches: Conditional Infomax Learning and Information Chan- nel Model --- p.6 / Chapter 1.5 --- Organization of the Thesis --- p.8 / Chapter I --- History and Background --- p.10 / Chapter 2 --- Statistical Pattern Recognition --- p.11 / Chapter 2.1 --- Patterns and Classifiers --- p.11 / Chapter 2.2 --- Bayes Theory --- p.12 / Chapter 2.3 --- Statistical Modeling --- p.14 / Chapter 2.3.1 --- Maximum Likelihood Estimation --- p.14 / Chapter 2.3.2 --- Gaussian Model --- p.15 / Chapter 2.3.3 --- Expectation-Maximization --- p.17 / Chapter 2.3.4 --- Finite Mixture Model --- p.18 / Chapter 2.3.5 --- A Nonparametric Technique: Parzen Windows --- p.21 / Chapter 3 --- Statistical Learning Theory --- p.24 / Chapter 3.1 --- Formulation of Learning Model --- p.24 / Chapter 3.1.1 --- Learning: Functional Estimation Model --- p.24 / Chapter 3.1.2 --- Representative Learning Problems --- p.25 / Chapter 3.1.3 --- Empirical Risk Minimization --- p.26 / Chapter 3.2 --- Consistency and Convergence of Learning --- p.27 / Chapter 3.2.1 --- Concept of Consistency --- p.27 / Chapter 3.2.2 --- The Key Theorem of Learning Theory --- p.28 / Chapter 3.2.3 --- VC Entropy --- p.29 / Chapter 3.2.4 --- Bounds on Convergence --- p.30 / Chapter 3.2.5 --- VC Dimension --- p.35 / Chapter 4 --- History of Statistical Feature Extraction --- p.38 / Chapter 4.1 --- Linear Feature Extraction --- p.38 / Chapter 4.1.1 --- Principal Component Analysis (PCA) --- p.38 / Chapter 4.1.2 --- Linear Discriminant Analysis (LDA) --- p.41 / Chapter 4.1.3 --- Other Linear Feature Extraction Methods --- p.46 / Chapter 4.1.4 --- Comparison of Different Methods --- p.48 / Chapter 4.2 --- Enhanced Models --- p.49 / Chapter 4.2.1 --- Stochastic Discrimination and Random Subspace --- p.49 / Chapter 4.2.2 --- Hierarchical Feature Extraction --- p.51 / Chapter 4.2.3 --- Multilinear Analysis and Tensor-based Representation --- p.52 / Chapter 4.3 --- Nonlinear Feature Extraction --- p.54 / Chapter 4.3.1 --- Kernelization --- p.54 / Chapter 4.3.2 --- Dimension reduction by Manifold Embedding --- p.56 / Chapter 5 --- Related Works in Feature Extraction --- p.59 / Chapter 5.1 --- Dimension Reduction --- p.59 / Chapter 5.1.1 --- Feature Selection --- p.60 / Chapter 5.1.2 --- Feature Extraction --- p.60 / Chapter 5.2 --- Kernel Learning --- p.61 / Chapter 5.2.1 --- Basic Concepts of Kernel --- p.61 / Chapter 5.2.2 --- The Reproducing Kernel Map --- p.62 / Chapter 5.2.3 --- The Mercer Kernel Map --- p.64 / Chapter 5.2.4 --- The Empirical Kernel Map --- p.65 / Chapter 5.2.5 --- Kernel Trick and Kernelized Feature Extraction --- p.66 / Chapter 5.3 --- Subspace Analysis --- p.68 / Chapter 5.3.1 --- Basis and Subspace --- p.68 / Chapter 5.3.2 --- Orthogonal Projection --- p.69 / Chapter 5.3.3 --- Orthonormal Basis --- p.70 / Chapter 5.3.4 --- Subspace Decomposition --- p.70 / Chapter 5.4 --- Principal Component Analysis --- p.73 / Chapter 5.4.1 --- PCA Formulation --- p.73 / Chapter 5.4.2 --- Solution to PCA --- p.75 / Chapter 5.4.3 --- Energy Structure of PCA --- p.76 / Chapter 5.4.4 --- Probabilistic Principal Component Analysis --- p.78 / Chapter 5.4.5 --- Kernel Principal Component Analysis --- p.81 / Chapter 5.5 --- Independent Component Analysis --- p.83 / Chapter 5.5.1 --- ICA Formulation --- p.83 / Chapter 5.5.2 --- Measurement of Statistical Independence --- p.84 / Chapter 5.6 --- Linear Discriminant Analysis --- p.85 / Chapter 5.6.1 --- Fisher's Linear Discriminant Analysis --- p.85 / Chapter 5.6.2 --- Improved Algorithms for Small Sample Size Problem . --- p.89 / Chapter 5.6.3 --- Kernel Discriminant Analysis --- p.92 / Chapter II --- Improvement in Linear Discriminant Analysis --- p.100 / Chapter 6 --- Generalized LDA --- p.101 / Chapter 6.1 --- Regularized LDA --- p.101 / Chapter 6.1.1 --- Generalized LDA Implementation Procedure --- p.101 / Chapter 6.1.2 --- Optimal Nonsingular Approximation --- p.103 / Chapter 6.1.3 --- Regularized LDA algorithm --- p.104 / Chapter 6.2 --- A Statistical View: When is LDA optimal? --- p.105 / Chapter 6.2.1 --- Two-class Gaussian Case --- p.106 / Chapter 6.2.2 --- Multi-class Cases --- p.107 / Chapter 6.3 --- Generalized LDA Formulation --- p.108 / Chapter 6.3.1 --- Mathematical Preparation --- p.108 / Chapter 6.3.2 --- Generalized Formulation --- p.110 / Chapter 7 --- Dynamic Feedback Generalized LDA --- p.112 / Chapter 7.1 --- Basic Principle --- p.112 / Chapter 7.2 --- Dynamic Feedback Framework --- p.113 / Chapter 7.2.1 --- Initialization: K-Nearest Construction --- p.113 / Chapter 7.2.2 --- Dynamic Procedure --- p.115 / Chapter 7.3 --- Experiments --- p.115 / Chapter 7.3.1 --- Performance in Training Stage --- p.116 / Chapter 7.3.2 --- Performance on Testing set --- p.118 / Chapter 8 --- Performance-Driven Subspace Learning --- p.119 / Chapter 8.1 --- Motivation and Principle --- p.119 / Chapter 8.2 --- Performance-Based Criteria --- p.121 / Chapter 8.2.1 --- The Verification Problem and Generalized Average Margin --- p.122 / Chapter 8.2.2 --- Performance Driven Criteria based on Generalized Average Margin --- p.123 / Chapter 8.3 --- Optimal Subspace Pursuit --- p.125 / Chapter 8.3.1 --- Optimal threshold --- p.125 / Chapter 8.3.2 --- Optimal projection matrix --- p.125 / Chapter 8.3.3 --- Overall procedure --- p.129 / Chapter 8.3.4 --- Discussion of the Algorithm --- p.129 / Chapter 8.4 --- Optimal Classifier Fusion --- p.130 / Chapter 8.5 --- Experiments --- p.131 / Chapter 8.5.1 --- Performance Measurement --- p.131 / Chapter 8.5.2 --- Experiment Setting --- p.131 / Chapter 8.5.3 --- Experiment Results --- p.133 / Chapter 8.5.4 --- Discussion --- p.139 / Chapter III --- Coupled Learning of Feature Transforms --- p.140 / Chapter 9 --- Coupled Space Learning --- p.141 / Chapter 9.1 --- Introduction --- p.142 / Chapter 9.1.1 --- What is Image Style Transform --- p.142 / Chapter 9.1.2 --- Overview of our Framework --- p.143 / Chapter 9.2 --- Coupled Space Learning --- p.143 / Chapter 9.2.1 --- Framework of Coupled Modelling --- p.143 / Chapter 9.2.2 --- Correlative Component Analysis --- p.145 / Chapter 9.2.3 --- Coupled Bidirectional Transform --- p.148 / Chapter 9.2.4 --- Procedure of Coupled Space Learning --- p.151 / Chapter 9.3 --- Generalization to Mixture Model --- p.152 / Chapter 9.3.1 --- Coupled Gaussian Mixture Model --- p.152 / Chapter 9.3.2 --- Optimization by EM Algorithm --- p.152 / Chapter 9.4 --- Integrated Framework for Image Style Transform --- p.154 / Chapter 9.5 --- Experiments --- p.156 / Chapter 9.5.1 --- Face Super-resolution --- p.156 / Chapter 9.5.2 --- Portrait Style Transforms --- p.157 / Chapter 10 --- Inter-Modality Recognition --- p.162 / Chapter 10.1 --- Introduction to the Inter-Modality Recognition Problem . . . --- p.163 / Chapter 10.1.1 --- What is Inter-Modality Recognition --- p.163 / Chapter 10.1.2 --- Overview of Our Feature Extraction Framework . . . . --- p.163 / Chapter 10.2 --- Common Discriminant Feature Extraction --- p.165 / Chapter 10.2.1 --- Formulation of the Learning Problem --- p.165 / Chapter 10.2.2 --- Matrix-Form of the Objective --- p.168 / Chapter 10.2.3 --- Solving the Linear Transforms --- p.169 / Chapter 10.3 --- Kernelized Common Discriminant Feature Extraction --- p.170 / Chapter 10.4 --- Multi-Mode Framework --- p.172 / Chapter 10.4.1 --- Multi-Mode Formulation --- p.172 / Chapter 10.4.2 --- Optimization Scheme --- p.174 / Chapter 10.5 --- Experiments --- p.176 / Chapter 10.5.1 --- Experiment Settings --- p.176 / Chapter 10.5.2 --- Experiment Results --- p.177 / Chapter IV --- A New Perspective: Informative Learning --- p.180 / Chapter 11 --- Toward Information Theory --- p.181 / Chapter 11.1 --- Entropy and Mutual Information --- p.181 / Chapter 11.1.1 --- Entropy --- p.182 / Chapter 11.1.2 --- Relative Entropy (Kullback Leibler Divergence) --- p.184 / Chapter 11.2 --- Mutual Information --- p.184 / Chapter 11.2.1 --- Definition of Mutual Information --- p.184 / Chapter 11.2.2 --- Chain rules --- p.186 / Chapter 11.2.3 --- Information in Data Processing --- p.188 / Chapter 11.3 --- Differential Entropy --- p.189 / Chapter 11.3.1 --- Differential Entropy of Continuous Random Variable . --- p.189 / Chapter 11.3.2 --- Mutual Information of Continuous Random Variable . --- p.190 / Chapter 12 --- Conditional Infomax Learning --- p.191 / Chapter 12.1 --- An Overview --- p.192 / Chapter 12.2 --- Conditional Informative Feature Extraction --- p.193 / Chapter 12.2.1 --- Problem Formulation and Features --- p.193 / Chapter 12.2.2 --- The Information Maximization Principle --- p.194 / Chapter 12.2.3 --- The Information Decomposition and the Conditional Objective --- p.195 / Chapter 12.3 --- The Efficient Optimization --- p.197 / Chapter 12.3.1 --- Discrete Approximation Based on AEP --- p.197 / Chapter 12.3.2 --- Analysis of Terms and Their Derivatives --- p.198 / Chapter 12.3.3 --- Local Active Region Method --- p.200 / Chapter 12.4 --- Bayesian Feature Fusion with Sparse Prior --- p.201 / Chapter 12.5 --- The Integrated Framework for Feature Learning --- p.202 / Chapter 12.6 --- Experiments --- p.203 / Chapter 12.6.1 --- A Toy Problem --- p.203 / Chapter 12.6.2 --- Face Recognition --- p.204 / Chapter 13 --- Channel-based Maximum Effective Information --- p.209 / Chapter 13.1 --- Motivation and Overview --- p.209 / Chapter 13.2 --- Maximizing Effective Information --- p.211 / Chapter 13.2.1 --- Relation between Mutual Information and Classification --- p.211 / Chapter 13.2.2 --- Linear Projection and Metric --- p.212 / Chapter 13.2.3 --- Channel Model and Effective Information --- p.213 / Chapter 13.2.4 --- Parzen Window Approximation --- p.216 / Chapter 13.3 --- Parameter Optimization on Grassmann Manifold --- p.217 / Chapter 13.3.1 --- Grassmann Manifold --- p.217 / Chapter 13.3.2 --- Conjugate Gradient Optimization on Grassmann Manifold --- p.219 / Chapter 13.3.3 --- Computation of Gradient --- p.221 / Chapter 13.4 --- Experiments --- p.222 / Chapter 13.4.1 --- A Toy Problem --- p.222 / Chapter 13.4.2 --- Face Recognition --- p.223 / Chapter 14 --- Conclusion --- p.230

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_325641
Date January 2006
ContributorsLin, Dahua., Chinese University of Hong Kong Graduate School. Division of Information Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatprint, xv, 250 leaves : ill. ; 30 cm.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0369 seconds