Reinforcement learning with functional representation

[Back to main page ]

In reinforcement learning, an intelligent and autonomous agent is put in an environment. It observes its state of the environment and take actions. After taking every action, the agent receives an reward from the environment and meanwhile changes its state. The aim of the agent is to learn a state-to-action mapping from the experience of its state-action-reward history, so that its accumulated long-term reward is maximized. The state-to-action mapping is usually called as a policy.

In real-world situations, such as driving vehicles and manipulating robotic hands, the mapping between the states to actions is commonly highly complex and hard to be linear. We expect that the agent can adaptively learn a policy to fit the complex situations. Functional representation, by which a function is represented as a combination of basis functions, is a powerful tool for learning non-linear functions.



Functional gradient method has been shown to be powerful in supervised learning, in the form of boosting algorithms. We present the PolicyBoost approach to learn a policy in the functional policy space, resulting in a stable nonlinear policy learning method handling both discrete and continuous state spaces and actions. For details please see:
Yang Yu, Peng-Fei Hou, Qing Da, and Yu Qian. Boosting nonparametric policies. In: Proceedings of the 2016 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'16), Singapore, 2016. (PDF)

The codes used in the experiments: (Code Download in Zip, 50KB)



A practical difficulty of learning functional represented policy is that it involves training and accumulating a lot of basis functions, all of which have to be invoked in every calculation of the policy output. Since the policy has to be repeatedly evaluated during both training and prediction stages of the reinforcement learning, learning functional representation policy suffers from a very large time cost for calculating every constituting basis functions. Thus we proposed the napping mechanism to reduce the time cost of using functional represented policy in reinforcement learning. The idea is to replace the learned function by a simple approximation function periodically along with the learning process. For a given policy formed by a set of models, an approximation model is obtained by mimicking the input-output behavior of the policy. For details please see:
Qing Da, Yang Yu, and Zhi-Hua Zhou. Napping for functional representation of policy. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'14), Paris, France, 2014. (PDF)

The codes used in the experiments: (Code Download in Zip, 30KB)