Topics on Machine Learning under Imperfect Supervision

This dissertation comprises several studies addressing supervised learning problems where the supervision is imperfect.

Firstly, we investigate the margin conditions in active learning. Active learning is characterized by its special mechanism where the learner can sample freely over the feature space and exploit mostly the limited labeling budget by querying the most informative labels. Our primary focus is to discern critical conditions under which certain active learning algorithms can outperform the optimal passive learning minimax rate. Within a non-parametric multi-class classification framework,our results reveal that the uniqueness of Bayes labels across the feature space serves as the pivotal determinant for the superiority of active learning over passive learning.

Secondly, we study the estimation of central mean subspace (CMS), and its application in transfer learning. We show that a fast parametric convergence rate is achievable via estimating the expected smoothed gradient outer product, for a general class of covariate distribution that admits Gaussian or heavier distributions. When the link function is a polynomial with a degree of at most r and the covariates follow the standard Gaussian, we show that the prefactor depends on the ambient dimension d as d^r. Furthermore, we show that under a transfer learning setting, an oracle rate of prediction error as if the CMS is known is achievable, when the source training data is abundant.

Finally, we present an innovative application involving the utilization of weak (noisy) labels for addressing an Individual Tree Crown (ITC) segmentation challenge. Here, the objective is to delineate individual tree crowns within a 3D LiDAR scan of tropical forests, with only 2D noisy manual delineations of crowns on RGB images available as a source of weak supervision. We propose a refinement algorithm designed to enhance the performance of existing unsupervised learning methodologies for the ITC segmentation problem.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/6fnv-kw51
Date January 2024
CreatorsYuan, Gan
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0018 seconds