This thesis consists of studies of two important problems in transfer learning: binary classification under covariate-shift transfer, and off-policy evaluation in reinforcement learning.
First, the problem of binary classification under covariate shift is considered, for which the first efficient procedure for optimal pruning of a dyadic classification tree is presented, where optimality is derived with respect to a notion of 𝒂𝒗𝒆𝒓𝒂𝒈𝒆 𝒅𝒊𝒔𝒄𝒓𝒆𝒑𝒂𝒏𝒄𝒚 between the shifted marginal distributions of source and target. Further, it is demonstrated that the procedure is adaptive to the discrepancy between marginal distributions in a neighbourhood of the decision boundary. It is shown how this notion of average discrepancy can be viewed as a measure of 𝒓𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒅𝒊𝒎𝒆𝒏𝒔𝒊𝒐𝒏 between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions. Experiments are carried out on real data to verify the efficacy of the pruning procedure as compared to other baseline methods for pruning under transfer.
The problem of off-policy evaluation for reinforcement learning is then considered, where two minimax lower bounds for the mean-square error of off-policy evaluation under Markov decision processes are derived. The first of these gives a non-asymptotic lower bound for OPE in finite state and action spaces over a model in which the mean reward is perturbed arbitrarily (up to a given magnitude) that depends on an average weighted chi-square divergence between the behaviour and target policies. The second provides an asymptotic lower bound for OPE in continuous state-space when the mean reward and policy ratio functions lie in a certain smoothness class.
Finally, the results of a study that purported to have derived a policy for sepsis treatment in ICUs are replicated and shown to suffer from excessive variance and therefore to be unreliable; our lower bound is computed and used as evidence that reliable off-policy estimation from this data would have required a great deal more samples than were available.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/wndt-qr15 |
Date | January 2024 |
Creators | Galbraith, Nicholas R. |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0021 seconds