Interactive Q-learning
Kristin Linn
Department of Statistics
North Carolina State University
Engr Room 1602
4400 University Drive, Fairfax, VA 22030
Time: 10:30 A.M. - 11:30 P.M.
\Date: Wednesday, Jan 22, 2014
Abstract
Evidence-based rules for optimal treatment allocation are key components in the quest for efficient, effective health care delivery. Q-learning, an approximate dynamic programming algorithm, is a popular method for estimating optimal sequential decision rules from data. Q-learning requires modeling nonsmooth, nonmonotone transformations of the data, complicating the search for adequately expressive, yet parsimonious, statistical models. The default Q-learning working model is multiple linear regression, which is misspecified under most data-generating models. We propose an alternative strategy for estimating optimal sequential decision rules for which the requisite statistical modeling does not depend on nonsmooth, nonmonotone transformed data and is thus amenable to established statistical approaches for exploratory data analysis, model building and validation. We derive the new method, Interactive Q-learning (IQ-learning), via an interchange in the order of certain steps in Q-learning. IQ-learning performs favorably in simulated experiments, and an illustrative case study is provided using data from a sequentially randomized trial studying depression therapies.