Approximate Dynamic Programming:
A Rolling Horizon Heuristic
Dynamic models have a wide range of application in financial systems, supply chain management systems, and natural systems such as climate and ecosystems. The status of the system in these models is defined by a set of characteristics known as state variables. The state of the system evolves over time either by the decisions made by the users of such systems or by uncontrolled changes that affect the system in a random manner. Due to the contingent nature of this evolution, evaluating the future status of systems with a large number of state variables requires an estimate of reward-to-go functions that can be computationally very expensive.
Developing fast and easy-to-use heuristics for finding optimal policies in complex real-world problems has been an active area of research in computer science and operations research in recent years. Dynamic Programming (DP) and Reinforcement learning (RL) are the main methods concerned with learning the optimal decision in a dynamic environment. While DP algorithms are usually model-based and offline, most RL algorithms are model-free and online. Model-based offline algorithms use explicit assumptions about the transition dynamics and reward function to pre-calculate the value function and store it to find the optimal action at any given state. On the other hand, in model-free online methods the optimal action is calculated at the decision time using the observed transition and reward data. Value iteration and policy iteration are two of the most common offline methods for finding optimal policies while rolling horizon is commonly used as an online strategy.
In this project, as part of my PhD thesis under the supervision of Dr. Valerie Thomas from Georgia Institute of Technology, we focus on rolling horizon algorithms for finite horizon problems. We use a multistep lookahead heuristic to estimate the reward-to-go at a given state and use this approximation to find the optimal action. We apply this method to a dynamic model of economy and climate change and analyze the impact of discount factor on the choice of approximation and provide insight into the robustness of approximation.
1. Shayegh S., Thomas V. M. “Adaptive stochastic integrated assessment modeling of optimal greenhouse gas emission reductions”, Climatic Change, 2015
2. Shayegh S., Thomas V. M. “Combined Online and Offline Methods: A Multistep Lookahead Algorithm for Approximate Dynamic Programing”
1. Shayegh S. Thomas V. M., “Stochastic integrated modeling of climate change”, INFORMS Annual Meeting, Minneapolis 2013
An example of the two-step-ahead algorithm