Recent advance in health and technology has made mobile apps a viable approach to delivering behavioral interventions in areas including physical activity encouragement, smoking cessation, substance abuse prevention, and mental health management. Due to the chronic nature of most of the disorders and heterogeneity among mobile users, delivery of the interventions needs to be sequential and tailored to individual needs. We operationalize the sequential decision making via a policy that takes a mobile user's past usage pattern and health status as input and outputs an app/intervention recommendation with the goal of optimizing the cumulative rewards of interest in an indefinite horizon setting. There is a plethora of reinforcement learning methods on the development of optimal policies in this case. However, the vast majority of the literature focuses on studying the convergence of the algorithms with infinite amount of data in computer science domain. Their performances in health applications with limited amount of data and high noise are yet to be explored. Technically the nature of sequential decision making results in an objective function that is non-smooth (not even a Lipschitz) and non-convex in the model parameters. This poses theoretical challenges to the characterization of the asymptotic properties of the optimizer of the objective function, as well as computational challenges for optimization. This problem is especially exacerbated with the presence of high dimensional data in mobile health applications.
In this dissertation we propose a regularized greedy gradient Q-learning (RGGQ) method to tackle this estimation problem. The optimal policy is estimated via an algorithm which synthesizes the PGM and the GGQ algorithms in the presence of an L₁ regularization, and its asymptotic properties are established. The theoretical framework initiated in this work can be applied to tackle other non-smooth high dimensional problems in reinforcement learning.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-zv13-2p78 |
Date | January 2021 |
Creators | Lu, Xiaoqi |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0021 seconds